Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

`test_that("non-biallelic SNPs are removed",` failing #36

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

I think this needs to be rewritten so it doesn't assume that omitting line 58 is the only thing required to make reformatted and org_lines the same. Should instead actually detect where the biallelic SNPs are. That way, it's more obvious what this function is trying to do, and it makes it robust to changes in example data.

https://github.com/neurogenomics/MungeSumstats/blob/master/tests/testthat/test-bi_alllelic.R

bschilder commented 3 years ago

I can see that it is indeed an issue of a slight shift in rows:

 dat1 <- data.table::fread(reformatted)
    dat2 <- data.table::fread(org)
    mismatching_rows <- lapply(colnames(dat1), function(x){which(dat1[[x]]!=dat2[-58][[x]])}) %>% 
      `names<-`(colnames(dat1))
    mismatching_rows_unique <- unique(unlist(mismatching_rows))

    data.table::rbindlist(list("1"=dat1[mismatching_rows_unique,], 
                               "2"=dat2[mismatching_rows_unique,]), 
                          idcol = "data")

Screenshot 2021-07-14 at 16 18 06

bschilder commented 3 years ago

Fixed this test and several similar ones that depended on row index rather than the row RSID is actually in.