bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
990 stars 354 forks source link

picard #2241

Closed MurliNair closed 6 years ago

MurliNair commented 6 years ago

Can I run picard as a standalone program as follows? picard CollectInsertSizeMetrics I=Sons_Aligned_Bam_File.bam O=insertmetrics.out H=insertSizeHistogram.pdf

Do i have to write a yaml file for that run it in the pipeline as just one job? Any examples on this?

chapmanb commented 6 years ago

Murli -- you are certainly welcome to use all of the tools bcbio installs outside of it. Part of the utility is that we distribute a lot of useful bioinformatics tools along with the workflows.

There isn't a way to easily incorporate this into the pipeline right now without coding inside bcbio itself. For your example we'd need to include a metrics run for this and then integrate with the current QC reporting. Is this something you're actually interested in, or an example? We currently have insert size metrics provided in the QC reporting through Qualimap, so hopefully this could work for your specific needs.

More generally, we're actively working to move to using Common Workflow Language which will provide a way to do this more directly. Apologies for not having a way to do this right now but hope this helps.

MurliNair commented 6 years ago

Hi Brad, Thanks, my goal here was to demonstrate the different tools to the students and then as a pipeline. I would be interested to learn how to include a metrics run. I could use the one that is currently available (Qualimap). If you have an example of that it would be great. It is great tool, I am trying to use it for teaching and research.

MurliNair commented 6 years ago

Hi Brad, I tried to run the following nohup picard CollectGcBiasMetrics R=Garvan_NA12878_HG001_HiSeq_Exome/GRCh37.fasta I=Sons_Aligned_Bam_File.bam O=output.txt CHART=gc_bias_metrics.pdf S=summary_metrics.txt VALIDATION_STRINGENCY=LENIENT TMP_DIR=/md1400/workTmp/ &

The tmp has 90T of disk space and still I get the following error

13:38:41.127 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/work/local/sha re/bcbio/anaconda/share/picard-2.17.2-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Sat Feb 03 13:38:41 EST 2018] CollectGcBiasMetrics CHART_OUTPUT=gc_bias_metrics.pdf SUMMARY_OUTPUT= summary_metrics.txt INPUT=Sons_Aligned_Bam_File.bam OUTPUT=output.txt TMP_DIR=[/md1400/workTmp] VALI DATION_STRINGENCY=LENIENT REFERENCE_SEQUENCE=Garvan_NA12878_HG001_HiSeq_Exome/GRCh37.fasta SCAN_W INDOW_SIZE=100 MINIMUM_GENOME_FRACTION=1.0E-5 IS_BISULFITE_SEQUENCED=false METRIC_ACCUMULATION_LEVEL =[ALL_READS] ALSO_IGNORE_DUPLICATES=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT _SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Sat Feb 03 13:38:41 EST 2018] Executing as mnair@genomics.iusb.edu on Linux 3.10.0-693.17.1.el7.x86 _64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Deflater: Intel; Inflater: Intel; Picard version: 2.17.2-SNAPSHOT INFO 2018-02-03 13:39:54 SinglePassSamProgram Processed 1,000,000 records. Elapsed ti me: 00:00:09s. Time for last 1,000,000: 4s. Last read position: 1:45,797,650 INFO 2018-02-03 13:39:58 SinglePassSamProgram Processed 2,000,000 records. Elapsed ti me: 00:00:13s. Time for last 1,000,000: 3s. Last read position: 1:146,456,399 INFO 2018-02-03 13:40:02 SinglePassSamProgram Processed 3,000,000 records. Elapsed ti me: 00:00:17s. Time for last 1,000,000: 3s. Last read position: 1:186,076,038 [Sat Feb 03 13:40:06 EST 2018] picard.analysis.CollectGcBiasMetrics done. Elapsed time: 1.42 minutes . Runtime.totalMemory()=1060110336 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at htsjdk.samtools.reference.FastaSequenceFile.readSequence(FastaSequenceFile.java:162) at htsjdk.samtools.reference.FastaSequenceFile.nextSequence(FastaSequenceFile.java:83) at htsjdk.samtools.reference.ReferenceSequenceFileWalker.get(ReferenceSequenceFileWalker.jav a:93) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:141) at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:84) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

chapmanb commented 6 years ago

Murli; It looks like you're running out of memory with the BAM file you're passing in. You can specify more memory for picard command line runs with:

picard -Xms1g -Xmx4g

at the start of your command. Hopefully adding that will help resolve the issue. Please let us know if you run into more problems.

MurliNair commented 6 years ago

Thank for your help. I shall try and let you know.