Closed bschilder closed 3 years ago
Potential solution, swap the order of these functions?:
#### Infer reference genome if necessary ####
if(is.null(ref_genome))
ref_genome <- get_genome_build(sumstats = sumstats_return$sumstats_dt)
#### Check 5: Check for uniformity in SNP col - no mix of rs/missing rs/chr:bp ####
sumstats_return <-
check_no_rs_snp(sumstats_dt = sumstats_return$sumstats_dt,
path = path,
ref_genome = ref_genome)
Guess that doesnt really make sense bc the latter requires ref_genome
. So perhaps do some filtering during get_genome_build
Added filtering step get_genome_build
seems to work.
Also added downsampling to speed up the func substantially.
sampled_snps <- 10000
...
#### Do some filtering first to avoid errors ####
sumstats <- sumstats[complete.cases(SNP)]
#### Downsample SNPs to save time ####
if((nrow(sumstats)>sampled_snps) && !(is.null(sampled_snps))){
snps <- sample(sumstats$SNP,sampled_snps)
} else {snps <- sumstats$SNP}
sumstats <- sumstats[SNP %in% snps,]
...
Encountered when processing VCF step-by-step in format_sumstats.
Specifically, the
get_genome_build
step.Data source: https://gwas.mrcieu.ac.uk/files/ieu-a-1124/ieu-a-1124.vcf.gz