compbio-UofT / medsavant

MedSavant is a search engine for genetic variants
22 stars 9 forks source link

some unsorted vcf files don't import #280

Closed jvlasblom closed 10 years ago

jvlasblom commented 10 years ago

All VCFs are sorted with an external sorting library (ExternalSort), should be fixed.

mfiume commented 10 years ago

Can you confirm that this library can handle very large VCF files ? E.g. 1-5GB

On Mon, Feb 10, 2014 at 3:57 PM, jvlasblom notifications@github.com wrote:

All VCFs are sorted with an external sorting library (ExternalSort), should be fixed.

Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/280#issuecomment-34681349 .

jvlasblom commented 10 years ago

Should be no problem if there's enough disk space, and filesize / EXTERNALSORT_MAX_TMPFILES bytes of memory available.

(EXTERNAL_SORT_MAX_TMPFILES is currently set to 128)

https://code.google.com/p/externalsortinginjava/

On 10/02/14 03:59 PM, mfiume wrote:

Can you confirm that this library can handle very large VCF files ? E.g. 1-5GB

On Mon, Feb 10, 2014 at 3:57 PM, jvlasblom notifications@github.com wrote:

All VCFs are sorted with an external sorting library (ExternalSort), should be fixed.

Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/280#issuecomment-34681349 .

— Reply to this email directly or view it on GitHub https://github.com/compbio-UofT/medsavant/issues/280#issuecomment-34681629.

jvlasblom commented 10 years ago

Should be fixed. If any VCF fails to import, it should no longer be because of sorting.