AdamaJava / adamajava

Other
14 stars 4 forks source link

feat(qmule): new classes for creating fastqs from BAMs and vice versa #337

Closed holmeso closed 1 year ago

holmeso commented 1 year ago

Description

This PR contains 2 new classes to create unmapped BAMs from fastq files, and to create fastq files from mapped BAM/CRAM files. They are effectively clones of picard's classes (FastqToSam and SamToFastq).

What sets them apart from the picard classes is that they will attempt to preserve any additional information in the FastqRecord header in the SAMRecord (FastqToSamWithHeaders). When writing back to fastq (SamToFastqWithHeaders), these additional headers (if present) will be returned to the FastqRecord header. This means that it should be possible to accurately recreate the fastq files that were used to make the BAM/CRAM, which means that it should be possible to delete the original fastq files, which means that disk space should be saved.

FastqToSamWithHeaders will write any additional header information into 2 user defined tags:

These tags are then either added back to the header (ZH) or added back to the read and quality (ZT) when SamToFastqWithHeaders is called.

Type of change

How Has This Been Tested?

New unit test classes have been included as part of this PR. These classes have been included as part of a modified FTUB_WGGSS wfl, with the same number of hard filtered vcf records produced (on the GS NA12878 dataset) as the existing FTUB_WGGSS wfl. The CRAM/BAM can be converted back to a fastq file with the same records as the original.

Are WDL Updates Required?

No wdl updates are required, although the expectation is that once this is in production, the FTUB_WGGSS wfl (or a new one based on that) will be update to call these new classes.

Checklist: