Closed bschilder closed 3 years ago
I can see that it is indeed an issue of a slight shift in rows:
dat1 <- data.table::fread(reformatted)
dat2 <- data.table::fread(org)
mismatching_rows <- lapply(colnames(dat1), function(x){which(dat1[[x]]!=dat2[-58][[x]])}) %>%
`names<-`(colnames(dat1))
mismatching_rows_unique <- unique(unlist(mismatching_rows))
data.table::rbindlist(list("1"=dat1[mismatching_rows_unique,],
"2"=dat2[mismatching_rows_unique,]),
idcol = "data")
Fixed this test and several similar ones that depended on row index rather than the row RSID is actually in.
I think this needs to be rewritten so it doesn't assume that omitting line 58 is the only thing required to make
reformatted
andorg_lines
the same. Should instead actually detect where the biallelic SNPs are. That way, it's more obvious what this function is trying to do, and it makes it robust to changes in example data.https://github.com/neurogenomics/MungeSumstats/blob/master/tests/testthat/test-bi_alllelic.R