Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Change tests to be more robust #38

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

Changed all tests so they're less susceptible to slight changes in code. For example, many tests used row index to identify the removed SNP, but row indices can change when rows are added/removed from the example data in the future, or when rows are sorted.

This now tests "handling missing data" more specifically (as opposed to other sources of differences).

Example:

expect_equal(reformatted_lines, org_lines[-58])

vs.

problem_snp <- "rs9320913"
rsid_index <- grep(problem_snp, org_lines, ignore.case = TRUE) 
expect_equal(reformatted_lines, org_lines[-rsid_index])

test-missing_data.R

test_that("Handle missing data", {
  file <- tempfile()
  #Remove data from line 3 to check it is deleted
  eduAttainOkbay <- readLines(system.file("extdata","eduAttainOkbay.txt",
                                          package="MungeSumstats"))
  eduAttainOkbay_missing <- eduAttainOkbay
  eduAttainOkbay_missing[3] <-
    "rs12987662\t2\t100821548\tA\tC\t0.3787\t0.027\t0.003\t"
  problem_snp <- "rs9320913"
  #write the Educational Attainment GWAS to a temp file for testing
  writeLines(eduAttainOkbay_missing,con = file)
  #Run MungeSumstats code
  reformatted <- MungeSumstats::format_sumstats(file,ref_genome="GRCh37",
                                                on_ref_genome = FALSE,
                                                strand_ambig_filter=FALSE,
                                                bi_allelic_filter=FALSE,
                                                allele_flip_check=FALSE,
                                                sort_coordinates = FALSE)
  reformatted_lines <- readLines(reformatted)
  #Should equal org apart from this one line
  writeLines(eduAttainOkbay,con = file)
  org <- MungeSumstats::format_sumstats(file,ref_genome="GRCh37",
                                        on_ref_genome = FALSE,
                                        strand_ambig_filter=FALSE,
                                        bi_allelic_filter=FALSE,
                                        allele_flip_check=FALSE, 
                                        sort_coordinates = FALSE)
  org_lines <- readLines(org)
  rsid_index <- grep(problem_snp, org_lines, ignore.case = TRUE) 
  #reordering in function, line 3 is now 58
  expect_equal(reformatted_lines, org_lines[-rsid_index])
})