broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
963 stars 367 forks source link

Error due to interval.list instead of interval_list suffix #1479

Open GATKSupportTeam opened 4 years ago

GATKSupportTeam commented 4 years ago

Error due to interval.list instead of interval_list suffix

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360056676692-HS-PENALTY-20X-is-1-on-one-version-of-GATK-crashes-on-newest-version-

--

When running GATK 4.0.0.0 this works fine but I get a HS_PENALTY_20X of -1

It errors out on GATK v4.1.4.1

I assume a -1 for HS_PENALTY_20X is incorrect?

 

# CONFIRMING FILES EXIST

=================================

3.8G /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam
54M /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list
45M /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list

# GATK VERSION

=================================

The Genome Analysis Toolkit (GATK) v4.1.4.1
HTSJDK Version: 2.21.0
Picard Version: 2.21.2
Using GATK jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar --version

 

# GATK COMMAND

gatk CollectHsMetrics --INPUT /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam --OUTPUT TEMP_NEW/19065WBC_fixmate_novosort_dupsrm.bam_hs_metrics.txt --BAIT_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list --TARGET_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list

=================================

22:29:20.348 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Feb 03 22:29:20 EST 2020] CollectHsMetrics --BAIT_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list --TARGET_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list --INPUT /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam --OUTPUT TEMP_NEW/19065WBC_fixmate_novosort_dupsrm.bam_hs_metrics.txt --METRIC_ACCUMULATION_LEVEL ALL_READS --NEAR_DISTANCE 250 --MINIMUM_MAPPING_QUALITY 20 --MINIMUM_BASE_QUALITY 20 --CLIP_OVERLAPPING_READS true --INCLUDE_INDELS false --COVERAGE_CAP 200 --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Feb 03, 2020 10:29:20 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Mon Feb 03 22:29:20 EST 2020] Executing as nowackj1@ridus004.ind.roche.com on Linux 3.10.0-1062.1.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.4.1
[Mon Feb 03 22:29:20 EST 2020] picard.analysis.directed.CollectHsMetrics done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2972712960
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.SAMException: Cannot read non-existent file: file:///data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/@HD%09VN:1.4%09SO:unsorted
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:498)
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:485)
at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:115)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar CollectHsMetrics --INPUT /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam --OUTPUT TEMP_NEW/19065WBC_fixmate_novosort_dupsrm.bam_hs_metrics.txt --BAIT_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list --TARGET_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list

(created from Zendesk ticket #4552)
gz#4552

bhanugandham commented 4 years ago

Hi @lbergelson as discussed during the office hrs, I created this issue ticket to brainstorm ideas around how to check for either "@"/"#" identifiers in the interval list file.

yfarjoun commented 4 years ago

I think that this is a GATK (not picard) problem.

whaleberg commented 4 years ago

It's a barclay problem. We patched barclay to add a warning in the case of an incorrectly labelled interval.list file which should mitigate it. Waiting on a barclay release though.

lbergelson commented 4 years ago

Huh, a mysterious stranger with insight into the problem. Lets all forget about whoever that person may be. I'm pretty sure they're correct in their assessment though..

yfarjoun commented 4 years ago

lol

On Sun, Mar 15, 2020 at 8:03 PM Louis Bergelson notifications@github.com wrote:

Huh, a mysterious stranger with insight into the problem. Lets all forget about whoever that person may be. I'm pretty sure they're correct in their assessment though..

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/picard/issues/1479#issuecomment-599284533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU6JUWBSHCGEZKPVRRQLPDRHVUENANCNFSM4K73HULQ .