Closed DarioS closed 3 years ago
The implemented way to deal with large VCF files is to iterate through them with something like
vcf_file <- open(VcfFile("...", yieldSize = 100000))
while (length(vcf <- reaadVcf(vcf_filie)) {
## ... work on chunk
}
close(vcf_file)
Remembering to use ScanVcfParam(what = ...)
or perhaps readGT()
or similar to selectively input just the fields of interest.
I imported a 14 GB VCF (uncompressed) and after a while I noticed it finally took 228 GB RAM when stored in memory (server has 512 GB RAM, so didn't access swap space). Could the package provide a more efficient representation of lots of variants in R?