compbio-UofT / medsavant

MedSavant is a search engine for genetic variants
22 stars 9 forks source link

.vcf import takes a VERY long time #256

Open jvlasblom opened 10 years ago

jvlasblom commented 10 years ago

VCF import takes a long time and may not work with the latest 1.1 snapshot

mfiume commented 10 years ago

Any idea why?

On Fri, Jan 17, 2014 at 7:27 PM, jvlasblom notifications@github.com wrote:

VCF import takes a long time and may not work with the latest 1.1 snapshot

— Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/256 .

jvlasblom commented 10 years ago

This block of messages is repeated in the server log, so it may be stuck in a loop somewhere. So far I haven't been able to find the problem.

2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(60)-127.0.0.1] INFO
org.ut.biolab.medsavant.server.db.util.DBUtils - Getting distinct values for z_variant_proj1_ref3_update3.dna_id 2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(57)-127.0.0.1] INFO
org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from /home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO
org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - Marked /home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id for deletion on exit. 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO
org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from /home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

On 18/01/14 01:34 AM, mfiume wrote:

Any idea why?

On Fri, Jan 17, 2014 at 7:27 PM, jvlasblom notifications@github.com wrote:

VCF import takes a long time and may not work with the latest 1.1 snapshot

— Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/256 .

— Reply to this email directly or view it on GitHub https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32675947.

mfiume commented 10 years ago

Hmm. This may be due to the reenabling of the open bam files button.

It had terrible performance with lots of individuals because for each patient it would do a lookup for BAM URL (instead of doing a single call to get all patient data and using the relevant bits).

I may be way off base, since it's hard for me to make sense of the stack traces on a phone screen.

Marc

On Jan 18, 2014, at 11:47 AM, jvlasblom notifications@github.com wrote:

This block of messages is repeated in the server log, so it may be stuck in a loop somewhere. So far I haven't been able to find the problem.

2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DBUtils - Getting distinct values for z_variant_proj1_ref3_update3.dna_id 2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(57)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from /home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - Marked /home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id for deletion on exit. 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from /home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

On 18/01/14 01:34 AM, mfiume wrote:

Any idea why?

On Fri, Jan 17, 2014 at 7:27 PM, jvlasblom notifications@github.com wrote:

VCF import takes a long time and may not work with the latest 1.1 snapshot

— Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/256 .

— Reply to this email directly or view it on GitHub https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32675947.

— Reply to this email directly or view it on GitHub.

jvlasblom commented 10 years ago

I think the problem is with the Jannovar jar file. The messages about distinct value cache are normal -- they're generated by the VariantFrequencyAggregatePane when it receives a 'cache expired' event from the CacheController. (Recall UI components can register themselves with the cache controller to be notified whenever a method with @Modifier is invoked. Cache expiration events are also fired every 15 minutes by default, and that's what's causing these messages).

Running the jannovar file from the command line takes a very long time and probably doesn't finish: java -jar ~/disk3/jim/medsavant_server/lib/jannovar-0.1.1-beta-2.jar -D ~/disk3/medsavant_server/permanent_files/cache/jannovar/refseq_hg19.ser -V /home/jim/disk3/medsavant_server/tmp/medsavant_jim/jim_medium.vcf

which outputs:

ABout to annotated VCF

and then hangs at 100% CPU.

On 18/01/14 12:51 PM, mfiume wrote:

Hmm. This may be due to the reenabling of the open bam files button.

It had terrible performance with lots of individuals because for each patient it would do a lookup for BAM URL (instead of doing a single call to get all patient data and using the relevant bits).

I may be way off base, since it's hard for me to make sense of the stack traces on a phone screen.

Marc

On Jan 18, 2014, at 11:47 AM, jvlasblom notifications@github.com wrote:

This block of messages is repeated in the server log, so it may be stuck in a loop somewhere. So far I haven't been able to find the problem.

2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DBUtils - Getting distinct values for z_variant_proj1_ref3_update3.dna_id 2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(57)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - Marked

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

for deletion on exit. 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

On 18/01/14 01:34 AM, mfiume wrote:

Any idea why?

On Fri, Jan 17, 2014 at 7:27 PM, jvlasblom notifications@github.com wrote:

VCF import takes a long time and may not work with the latest 1.1 snapshot

— Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/256 .

— Reply to this email directly or view it on GitHub

https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32675947.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32687901.

mfiume commented 10 years ago

Another off the cuff thing to check is the memory usage while it's slow. Jannovar is a memory hog and it might be that we need to allocate more ram.

Marc

On Jan 18, 2014, at 4:53 PM, jvlasblom notifications@github.com wrote:

I think the problem is with the Jannovar jar file. The messages about distinct value cache are normal -- they're generated by the VariantFrequencyAggregatePane when it receives a 'cache expired' event from the CacheController. (Recall UI components can register themselves with the cache controller to be notified whenever a method with @Modifier is invoked. Cache expiration events are also fired every 15 minutes by default, and that's what's causing these messages).

Running the jannovar file from the command line takes a very long time and probably doesn't finish: java -jar ~/disk3/jim/medsavant_server/lib/jannovar-0.1.1-beta-2.jar -D ~/disk3/medsavant_server/permanent_files/cache/jannovar/refseq_hg19.ser -V /home/jim/disk3/medsavant_server/tmp/medsavant_jim/jim_medium.vcf

which outputs:

ABout to annotated VCF

and then hangs at 100% CPU.

On 18/01/14 12:51 PM, mfiume wrote:

Hmm. This may be due to the reenabling of the open bam files button.

It had terrible performance with lots of individuals because for each patient it would do a lookup for BAM URL (instead of doing a single call to get all patient data and using the relevant bits).

I may be way off base, since it's hard for me to make sense of the stack traces on a phone screen.

Marc

On Jan 18, 2014, at 11:47 AM, jvlasblom notifications@github.com wrote:

This block of messages is repeated in the server log, so it may be stuck in a loop somewhere. So far I haven't been able to find the problem.

2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DBUtils - Getting distinct values for z_variant_proj1_ref3_update3.dna_id 2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(57)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - Marked

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

for deletion on exit. 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

On 18/01/14 01:34 AM, mfiume wrote:

Any idea why?

On Fri, Jan 17, 2014 at 7:27 PM, jvlasblom notifications@github.com wrote:

VCF import takes a long time and may not work with the latest 1.1 snapshot

— Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/256 .

— Reply to this email directly or view it on GitHub

https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32675947.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32687901.

— Reply to this email directly or view it on GitHub.

jvlasblom commented 10 years ago

Nope, Jannovar still hangs from the command line with -Xmx8192m, and I ran the server with -Xmx16384m...

On 18/01/14 04:56 PM, mfiume wrote:

Another off the cuff thing to check is the memory usage while it's slow. Jannovar is a memory hog and it might be that we need to allocate more ram.

Marc

On Jan 18, 2014, at 4:53 PM, jvlasblom notifications@github.com wrote:

I think the problem is with the Jannovar jar file. The messages about distinct value cache are normal -- they're generated by the VariantFrequencyAggregatePane when it receives a 'cache expired' event from the CacheController. (Recall UI components can register themselves with the cache controller to be notified whenever a method with @Modifier is invoked. Cache expiration events are also fired every 15 minutes by default, and that's what's causing these messages).

Running the jannovar file from the command line takes a very long time and probably doesn't finish: java -jar ~/disk3/jim/medsavant_server/lib/jannovar-0.1.1-beta-2.jar -D ~/disk3/medsavant_server/permanent_files/cache/jannovar/refseq_hg19.ser -V /home/jim/disk3/medsavant_server/tmp/medsavant_jim/jim_medium.vcf

which outputs:

ABout to annotated VCF

and then hangs at 100% CPU.

On 18/01/14 12:51 PM, mfiume wrote:

Hmm. This may be due to the reenabling of the open bam files button.

It had terrible performance with lots of individuals because for each patient it would do a lookup for BAM URL (instead of doing a single call to get all patient data and using the relevant bits).

I may be way off base, since it's hard for me to make sense of the stack traces on a phone screen.

Marc

On Jan 18, 2014, at 11:47 AM, jvlasblom notifications@github.com wrote:

This block of messages is repeated in the server log, so it may be stuck in a loop somewhere. So far I haven't been able to find the problem.

2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DBUtils - Getting distinct values for z_variant_proj1_ref3_update3.dna_id 2014-01-18 11:35:05,055 2703174 [RMI TCP Connection(57)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - Marked

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

for deletion on exit. 2014-01-18 11:35:05,056 2703175 [RMI TCP Connection(60)-127.0.0.1] INFO org.ut.biolab.medsavant.server.db.util.DistinctValuesCache - 0 values retrieved from

/home/jim/disk3/medsavant_server/permanent_files/cache/rontest/z_variant_proj1_ref3_update3/dna_id

On 18/01/14 01:34 AM, mfiume wrote:

Any idea why?

On Fri, Jan 17, 2014 at 7:27 PM, jvlasblom notifications@github.com wrote:

VCF import takes a long time and may not work with the latest 1.1 snapshot

— Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/256 .

— Reply to this email directly or view it on GitHub

https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32675947.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub

https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32687901.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32694336.

jvlasblom commented 10 years ago

Problem is internal to Jannovar, and only occurs with certain VCF files.

mfiume commented 10 years ago

Thanks Jim. Please follow up with Peter Robinson to address this.

On Mon, Jan 20, 2014 at 12:49 PM, jvlasblom notifications@github.comwrote:

Problem is internal to Jannovar, and only occurs with certain VCF files.

— Reply to this email directly or view it on GitHubhttps://github.com/compbio-UofT/medsavant/issues/256#issuecomment-32781442 .

mfiume commented 10 years ago

Has this been resolved?

jvlasblom commented 10 years ago

Resolved by Peter Robinson, but new .jar needs to be incorporated.

mfiume commented 10 years ago

The new .jar has been incorporated. Marking as test