broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

'Cannot read non-existent file' issue related to incorrect parsing of paths #9038

Closed jphruska closed 1 week ago

jphruska commented 1 week ago

Bug Report

Affected tool(s) or class(es)

gatk CleanSam (but suspect its a general gatk issue)

Affected version(s)

4.2.3.0

Description

htsjdk.samtools.SAMException path issue; extra forward slashes (/) added to beginning of input paths.

Steps to reproduce

Commands:

export SINGULARITY_CACHEDIR="/lustre/work/johruska/singularity-cachedir" workdir=/lustre/scratch/johruska/central_america_pine_oak/troglodytes singularity exec $SINGULARITY_CACHEDIR/gatk_4.2.3.0.sif gatk CleanSam -I ${workdir}/01_bam_files/Geothlypis_poliocephala_KU_9041_CHAL.bam -O ${workdir}/01_bam_files/Geothlypis_poliocephala_KU_9041_CHAL_cleaned.bam

Expected behavior

Previously, on other partitions of this computing cluster, this command ran with no errors. The fact that this now fails when using a distinct partition suggests this error may be due to the environment, but I have no clue what that may be related to. Java jdk version is openjdk version "1.8.0_302". Gatk is being run using a singularity container.

Actual behavior

Now, gatk is incorrectly parsing paths to input files, by adding extra forward slashes to paths to input files. The paths are correct, and have been triple checked.

Using GATK jar /gatk/gatk-package-4.2.3.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.3.0-local.jar CleanSam -I /lustre/scratch/johruska/central_america_pine_oak/troglodytes/01_bam_files/Geothlypis_poliocephala_KU_9041_CHAL.bam -O /lustre/scratch/johruska/central_america_pine_oak/troglodytes/01_bam_files/Geothlypis_poliocephala_KU_9041_CHAL_cleaned.bam 23:00:51.722 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so [Tue Nov 12 23:00:51 GMT 2024] CleanSam --INPUT /lustre/scratch/johruska/central_america_pine_oak/troglodytes/01_bam_files/Geothlypis_poliocephala_KU_9041_CHAL.bam --OUTPUT /lustre/scratch/johruska/central_america_pine_oak/troglodytes/01_bam_files/Geothlypis_poliocephala_KU_9041_CHAL_cleaned.bam --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false Nov 12, 2024 11:00:53 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine INFO: Failed to detect whether we are running on Google Compute Engine. [Tue Nov 12 23:00:53 GMT 2024] Executing as johruska@cpu-23-28 on Linux 4.18.0-147.8.1.el8_1.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.3.0 [Tue Nov 12 23:00:53 GMT 2024] picard.sam.CleanSam done. Elapsed time: 0.03 minutes. Runtime.totalMemory()=2076049408 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp htsjdk.samtools.SAMException: Cannot read non-existent file: file:///lustre/scratch/johruska/central_america_pine_oak/troglodytes/01_bam_files/Geothlypis_poliocephala_KU_9041_CHAL.bam at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:498) at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:485) at picard.sam.CleanSam.doWork(CleanSam.java:72) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

Any help on the matter would be greatly appreciated.

Thanks Jack


lbergelson commented 1 week ago

The URI file:///luster/... should be equivalent to /luster/...
Are those leading slashes what you mean by extra ones? It's not clear to me if this is really a GATK problem or some more complicated issue with your environment. Are you able to read those files from other singularity containers? I suspect there may be an issue of access privileges between singularity and your luster drive.

jphruska commented 1 week ago

Hi @lbergelson. Thanks for the reply.

As you can tell, I'm not very well informed on URI file schemes.

The issue was just resolved by the cluster administrators. Apparently it was related to a difference in singularity versions (I was trying to run the script on a partition I hadn't previously). Adding bind paths of my /lustre/scratch directory to the singularity exec command fixed the problem, and gatk was then able to find the right input files.

lbergelson commented 1 week ago

Thanks for the update. I'm glad it was resolved!