dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
150 stars 38 forks source link

idea: cache the last-seen reference record #160

Open mlin opened 5 years ago

mlin commented 5 years ago

When loci are sufficiently dense, most of the time the genotyper gets back from the database the same reference records pertinent to the other recent loci it looked at. Maybe we can come up with 'safe' rules for regurgitating it from some cache, saving the bucket scan and bcf_unpack each time. The new query has to be contained within the reference record range, and there have to be no other records (in the same sample) overlapping that reference record.

Thread contention on this cache might become a problem, though.