Closed sooheelee closed 5 years ago
Should reply back to the researcher about whether the tool cannot use NIO or that this is a bug that is being fixed.
Any estimate on when this patch might come out? Our team really needs it as soon as possible. Thanks!
I'm a little confused about this, as we most certainly use CrosscheckFingerprints in production using the NIO capabilities. Could you try using the Picard cloudJar for this just to minimized one additional variable?
Also, please make sure that the index for the bam is available and readable.
Looks like the Picard cloudjar has been a feature starting with release v2.18.1 (March 26, 2018). The standalone jars are downloadable from the normal Picard release page at https://github.com/broadinstitute/picard/releases. If this works for you @ruslanafr, then I would recommend continuing to use this jar until we figure out what is going on with calling Picard from GATK.
@yfarjoun, @ruslanafr is using the tool with two bams as input:
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam \
Would this make a difference?
That shouldn't make any difference.
It could be their hitting some connection problem and going into a retry loop or some sort. We have pretty insanely long retry attempts configured. 20 hours seems long, but it's possible something isn't working right in the retry code. If they could get a few live stack traces of the running program to see what it's doing it might help? If they can access the machine it runs on they can do so using jstack.
On Sat, May 26, 2018, 1:27 PM Yossi Farjoun notifications@github.com wrote:
That shouldn't make any difference.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/picard/issues/1175#issuecomment-392275868, or mute the thread https://github.com/notifications/unsubscribe-auth/AEe4rFERazCK0NTMxrmYgDuSuxcQ34hNks5t2ZCCgaJpZM4UHMzl .
@lbergelson, @ruslanafr is a researcher at the Broad. From the forum post:
/usr/local/jre1.8.0_73/bin/java -Xmx14g -jar /gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar CrosscheckFingerprints \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam \
-H /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt \
--QUIET false --EXIT_CODE_WHEN_MISMATCH 0 \
--OUTPUT crosscheck.stats.txt \
--VALIDATION_STRINGENCY LENIENT
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/fc-4aacc2d6-4017-4fc2-b95a-fb892a3562b9/70dc8591-32c2-4de1-b2f8-2af887f2a8e3/Clinical_Workflow/6794a873-5bbb-46ef-ae2c-a10e77dab94e/call-CrossCheckLaneFingerprints_Task/attempt-2/tmp.197aeb50
17:12:16.387 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu May 17 17:12:16 UTC 2018] CrosscheckFingerprints --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam --OUTPUT crosscheck.stats.txt --HAPLOTYPE_MAP /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt --EXIT_CODE_WHEN_MISMATCH 0 --QUIET false --VALIDATION_STRINGENCY LENIENT --CROSSCHECK_MODE CHECK_SAME_SAMPLE --LOD_THRESHOLD 0.0 --CROSSCHECK_BY READGROUP --NUM_THREADS 1 --CALCULATE_TUMOR_AWARE_RESULTS true --ALLOW_DUPLICATE_READS false --GENOTYPING_ERROR_RATE 0.01 --OUTPUT_ERRORS_ONLY false --LOSS_OF_HET_RATE 0.5 --EXPECT_ALL_GROUPS_TO_MATCH false --VERBOSITY INFO --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu May 17 17:12:16 UTC 2018] Executing as root@45dc4fe55a48 on Linux 4.9.0-0.bpo.6-amd64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.0.4.0
INFO 2018-05-17 17:12:18 CrosscheckFingerprints Fingerprinting 2 INPUT files.
But then the task just gets stuck without printing out anything else.
@ruslanafr, is it possible for you to access the machine you are running this on and try what @lbergelson suggests above with jstack?
This specific submission was run on FireCloud, but I tried it on my Mac and got the same results. I can try using jstack on Tuesday.
Thanks! Ruslana
On May 26, 2018, at 18:47, sooheelee notifications@github.com<mailto:notifications@github.com> wrote:
@lbergelsonhttps://github.com/lbergelson, @ruslanafrhttps://github.com/ruslanafr is a researcher at the Broad. From the forum post:
/usr/local/jre1.8.0_73/bin/java -Xmx14g -jar /gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar CrosscheckFingerprints \ -I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam \ -I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam \ -H /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt \ --QUIET false --EXIT_CODE_WHEN_MISMATCH 0 \ --OUTPUT crosscheck.stats.txt \ --VALIDATION_STRINGENCY LENIENT Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/fc-4aacc2d6-4017-4fc2-b95a-fb892a3562b9/70dc8591-32c2-4de1-b2f8-2af887f2a8e3/Clinical_Workflow/6794a873-5bbb-46ef-ae2c-a10e77dab94e/call-CrossCheckLaneFingerprints_Task/attempt-2/tmp.197aeb50 17:12:16.387 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so [Thu May 17 17:12:16 UTC 2018] CrosscheckFingerprints --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam --OUTPUT crosscheck.stats.txt --HAPLOTYPE_MAP /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt --EXIT_CODE_WHEN_MISMATCH 0 --QUIET false --VALIDATION_STRINGENCY LENIENT --CROSSCHECK_MODE CHECK_SAME_SAMPLE --LOD_THRESHOLD 0.0 --CROSSCHECK_BY READGROUP --NUM_THREADS 1 --CALCULATE_TUMOR_AWARE_RESULTS true --ALLOW_DUPLICATE_READS false --GENOTYPING_ERROR_RATE 0.01 --OUTPUT_ERRORS_ONLY false --LOSS_OF_HET_RATE 0.5 --EXPECT_ALL_GROUPS_TO_MATCH false --VERBOSITY INFO --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Thu May 17 17:12:16 UTC 2018] Executing as root@45dc4fe55a48 on Linux 4.9.0-0.bpo.6-amd64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.0.4.0 INFO 2018-05-17 17:12:18 CrosscheckFingerprints Fingerprinting 2 INPUT files.
But then the task just gets stuck without printing out anything else.
@ruslanafrhttps://github.com/ruslanafr, is it possible for you to access the machine you are running this on and try what @lbergelsonhttps://github.com/lbergelson suggests above with jstack?
- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/picard/issues/1175#issuecomment-392292752, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AZj7KLJF42SL6rq28XtiBix1pQ0KpKrfks5t2duQgaJpZM4UHMzl.
I ran into this same problem myself today running from both local as well as a GCE node. Credentials/ADC is fine, and I can gsutil the files and run them locally no problem so I don't think it's a credentials issue.
I also looked at the production/best practice pipeline, and it doesn't look like it's using NIO. Here's the snipped from the task which is localizing files
task CrossCheckFingerprints {
Array[File] input_bams
Array[File] input_bam_indexes
Finally, in case it helps, I took a threaddump while it was running (30+ minutes).
"pool-2-thread-1" #14 prio=5 os_prio=0 tid=0x00007f5d58ee9800 nid=0x2c7 runnable [0x00007f5d23b80000] java.lang.Thread.State: RUNNABLE at com.sun.crypto.provider.GaloisCounterMode.doLastBlock(GaloisCounterMode.java:394) at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:552) at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1046) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:983) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:491) at javax.crypto.Cipher.doFinal(Cipher.java:2377) at sun.security.ssl.CipherBox.decrypt(CipherBox.java:461) at sun.security.ssl.InputRecord.decrypt(InputRecord.java:172) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1015)
interesting.
The place I know that we use crosscheck with NIO is in the joint-calling pipeline, but there we are comparing vcfs against vcfs, not bams against bams...so I'm wondering if there's a problem with the BAM implementation somehow....anyhow, good information will attempt to reproduce.
I downloaded picardcloud.jar and ran it locally. This didn't work either. The printout was exactly the same as before. My credentials are ok since I can use gsutil to copy the files to my machine.
This is the threaddump from jstack for this run:
"pool-2-thread-1" #15 prio=5 os_prio=31 tid=0x00007fe445249000 nid=0x1407 runnable [0x000070000d6e0000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
This looks very similar to what @kcibul printed, so I think this is probably the same issue.
I'm not sure if this issue is connected in any way, but I also tried to run the CleanSam tool (using both GATK4.0.4.0 and picardcloud.jar) with no success. The error message I got in both cases is:
Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///Users/rfrazer/Projects/moap/testing_staging_area/gs:/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-04-1335-11A-01W-0489-09.bam at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:426) at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:413) at picard.sam.CleanSam.doWork(CleanSam.java:76) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
It's clear that the tool tries to access the given path from my current directory and not GCS.
The above stacktrace does indicate that CrosscheckFingerprints tries to access GCS, but I thought this is worth mentioning in case the two issues are somehow connected.
Fyi, I can use NIO on CRAMs and BAMs with GATKv4.0.4.0 tools (PrintReads and CNV tool), so the engine itself does support successful NIO.
@ruslanafr That error makes sense most picard tools don't support GCS at all.
@lbergelson, can you clarify further please? When a Picard tool is hooked up to the GATK engine, and is a walker or other such type of tool that should theoretically be able to use NIO, then can such tools perform NIO via the GATK engine? Or, do the Picard tools individually have to support NIO to work within GATK?
Most picard tools are imported directly into GATK without modification. There isn't any way to automatically upgrade them to use NIO. Each one needs to be individually modified and as far as I know most have not been.
Any progress on this bug?
Hi @ruslanafr
Thanks for reporting this. It seems that there has been a regression and that in HTSJDK reading many small intervals of a bam file over NIO is now unacceptably slow.
I've submitted an htsjdk issue: https://github.com/samtools/htsjdk/issues/1161 and hope that it will get resolved soon.
Bug Report
DavidR says to bring this to your attention @yfarjoun.
Affected tool(s)
CrosscheckFingerprints called from GATK4.0.4.0
Affected version(s)
Description
Steps to reproduce
Leaving this to Picard folks to try to reproduce.
Expected behavior
Looks like either (i) the tool isn't compatible with NIO and should produce an error or (ii) there is a bug in the NIO process for this tool that should be fixed.