AdamaJava / adamajava

Other
14 stars 5 forks source link

update sam writer factory to cope with CRAM file #327

Closed ChristinaXu2017 closed 1 year ago

ChristinaXu2017 commented 1 year ago

Description

Our adamajava tools create SAM/BAM outputs by calling org.qcmg.picard.SAMOrBAMWriterFactory, but it won't create CRAM output. Here we update the code but not API, it will make CRAM output if the output file name ending with .cram. The detailed changes are:

Type of change

How Has This Been Tested?

Existing unit tests are updated and new unit tests for CRAM file is added

Are WDL Updates Required?

No

Checklist:

ChristinaXu2017 commented 1 year ago

thanks for spotting it, this function only works for renaming, the writer will automatically close by the try block.

ChristinaXu2017 commented 1 year ago

The java HTSJDK library creates .bai for bam index; .cram.bai for cram index. The SAMTOOLs c package creates .bam.bai for bam index; .cram.crai for cram index.

It is arguable whether we should follow HTSJDK or SAMTOOLs. I checked "gatk-workflows:five-dollar-genome-analysis-pipeline". It calls "picard.jar SortSam" to create .bai during "ToBam.UnmappedBamToAlignedBam"; however then, it calls SAMTOOLs to index the cram file and then create .cram.crai.

I will run GATK::ApplyBQSR, it can pick up both .bai and .bam.bai; however it throw error if missing index file.