PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

CRAM Input Ignored #594

Open DarioS opened 2 years ago

DarioS commented 2 years ago

When I used BAM files and version 2.9.4 of GRIDSS, the log files look like

INFO    2022-07-19 15:29:56     SAMFileWriterFactory    Unknown file extension, assuming BAM format when writing file: file:///dev/stdout
INFO    2022-07-19 15:30:05     SinglePassSamProgram    Processed    10,000,000 records.  Elapsed time: 00:00:09s.  Time for last 10,000,000:    9s.  Last read position: chr1:33,241,228
INFO    2022-07-19 15:30:14     SinglePassSamProgram    Processed    20,000,000 records.  Elapsed time: 00:00:18s.  Time for last 10,000,000:    8s.  Last read position: chr1:66,241,160

but if the input is CRAM, I see

WARNING 2022-07-21 13:29:36     CollectInsertSizeMetrics        All data categories were discarded because they contained < 0.05 of the total aligned paired data.
WARNING 2022-07-21 13:29:36     CollectInsertSizeMetrics        Total mapped pairs in all categories: 0.0
[Thu Jul 21 13:29:36 GMT+10:00 2022] gridss.analysis.CollectGridssMetrics done. Elapsed time: 0.02 minutes.

and preprocessing finishes without any errors in a couple of seconds. Can it be an error if 0 mapped pairs are found?

DarioS commented 2 years ago

Minimum code to trigger the warnings is:

$ java -cp software/gridss.jar gridss.analysis.CollectGridssMetrics TMP_DIR=/tmp/ I=/g/data/ag5/RYADAV/R_211102_RYADAV_DNA_M001/alignment/11_21_PK6_Fibro.cram O=/tmp/tmp.11_21_PK6_Fibro.cram THRESHOLD_COVERAGE=50000 PROGRAM=CollectInsertSizeMetrics STOP_AFTER=100

However, running CollectInsertSizeMetrics directly, I can't trigger it.

$ gatk CollectInsertSizeMetrics -AS --STOP_AFTER 100 -I /g/data/ag5/RYADAV/R_211102_RYADAV_DNA_M001/alignment/11_21_PK6_Clone2.cram -O /tmp/test.txt -H /tmp/histogram.txt -R $FASTA

How can the warnings and skipped processing be avoided via CollectGridssMetrics?

DarioS commented 2 years ago

I converted CRAM files into BAM files using samtools view and GRIDSS' pipeline works.

dr-ashu-geno commented 2 years ago

Hi @DarioS

You need to use a more recent version of GRIDSS. I had the same problem with CRAM file as input for GRIDSS version 2.9.4 but I used GRIDSS version 2.13 and it is working on CRAM file now. Please refer to the issue #291 about this.

Best,