AdamaJava / adamajava

Other
14 stars 5 forks source link

qsv featuring tiled aligner fails on GRCh38 #307

Closed delocalizer closed 2 years ago

delocalizer commented 2 years ago

Describe the bug I would like to run the current qsv tool on bams aligned against GRCh38. Currently this gets past FindDiscordantPairClustersMT but then fails with:

21:13:58.315 [pool-95-thread-3] SEVERE org.qcmg.qsv.softclip.FindClipClustersMT - Setting exit status in clip cluster thread to 1 as exception caught: htsjdk.samtools.SAMException: Unable to find entry for contig: GL000192.1

To Reproduce

  1. Generate a GRCh38 tiled aligner .txt.gz file. e.g. /mnt/lustre/reference/genomeinfo/q3clinvar/q3tiledaligner_5k.GRCh38.txt.gz
  2. Align some test+control sequence to GRCh38
  3. Run qsv on the input bams, against the GRCh38 reference and tiled aligner files
  4. Example: /working/genomeinfo/cromwell-test/cromwell-executions/somaticDnaFastqToMaf/ce6df1cc-e145-4e5c-a0e8-96e80c8c0e36/call-qsvControlVsTest

Expected behavior qsv completes successfully.

Additional context GL000192.1 is a named contig from GRCh37, not present in GRCh38. To exclude the possibility of user error I exhaustively searched all inputs specified in the failing example qsv .ini (bams, tiled aligner.txt.gz and reference fasta) and none contain the offending contig GL000192.1. It does however appear hard-coded into q3tiledaligner in the grch37Positions map. I wonder if this reference-specific map is even necessary in the code as the same information appears to be included in the header of the tiledaligner reference file.

holmeso commented 2 years ago

fixed - pls reopen if this is not the case....