broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
985 stars 369 forks source link

How you can keep RX barcode in uBAM after FastqToSam? #1657

Closed MatteoZam closed 1 year ago

MatteoZam commented 3 years ago

Hi everybody, we have got 10 bases UMI in the header of fastqfiles. It is codified by RX tag. We are using GATK to find somatic variants with low VAF. Unfortunately we found that after fastqtosam command we lost RX tag. Which is the correct procedure to keep this information also in uBAM?

We used this command contained in GATK WDL file.

picard FastqToSam \ FASTQ="MN18_700.R1.fastq.gz" \ FASTQ2="MN18_700.R2.fastq.gz" \ OUTPUT="ubam.bam" \ SAMPLE_NAME="MN18_700" \ READ_GROUP_NAME="S3771" \ LIBRARY_NAME="LowInput" \ PLATFORM_UNIT="NS500743" \ PLATFORM="illumina" \ SEQUENCING_CENTER="ICH" \ RUN_DATE="2019-06-05" \ SORT_ORDER="queryname"

but when we check for RX tags that are contained in fastq files header in uBAM file, we didn't find find them.

samtools view ubam.bam | less

NS500743:422:HMH25AFXY:1:11101:10000:7645 77 0 0 0 0 AGTTTACGGTTTTATCAATACAGTTTAATACACTTTTAAAGCCATACTTCTGGCATGATCTAGCATTTTCCAGGAAAATAAAAGAGGCACATACTGTAACTGTTGATCATGGGGAGTGATGGATGGGTTCTGCCCTCTGCTAGTGCAAAA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEA<AEEEAAEEEEAAAEEE RG:Z:S3771 NS500743:422:HMH25AFXY:1:11101:10000:7645 141 0 0 0 0 CTTACCTGTTCCTCTAGACTTTCAGTGATATGTTTTGCACTAGCAGAGGGCAGAACCCATCCATCACTCCCCATGATCAACAGTTACAGTATGTGCCTCTTTTATTTTCCTGGAAAATGCTAGATCATGCCAGAAGTATGGCTTTAAAAG AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE6/EEEEEEEEEEEEEE<EEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEEEEEEE< RG:Z:S3771

zcat < MN18_700.R1.fastq.gz | grep -A 3 "NS500743:422:HMH25AFXY:1:11101:10000:7645"

@NS500743:422:HMH25AFXY:1:11101:10000:7645 RX:Z:GGTTACCGGA QX:Z:AAAAAEEEEE AGTTTACGGTTTTATCAATACAGTTTAATACACTTTTAAAGCCATACTTCTGGCATGATCTAGCATTTTCCAGGAAAATAAAAGAGGCACATACTGTAACTGTTGATCATGGGGAGTGATGGATGGGTTCTGCCCTCTGCTAGTGCAAAA + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEA<AEEEAAEEEEAAAEEE

zcat < MN18_700.R2.fastq.gz | grep -A 3 "NS500743:422:HMH25AFXY:1:11101:10000:7645"

@NS500743:422:HMH25AFXY:1:11101:10000:7645 RX:Z:GGTTACCGGA QX:Z:AAAAAEEEEE CTTACCTGTTCCTCTAGACTTTCAGTGATATGTTTTGCACTAGCAGAGGGCAGAACCCATCCATCACTCCCCATGATCAACAGTTACAGTATGTGCCTCTTTTATTTTCCTGGAAAATGCTAGATCATGCCAGAAGTATGGCTTTAAAAG + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEE6/EEEEEEEEEEEEEE<EEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEEEEEEE<

Thanks for any kinds of help

Mat

matthdsm commented 3 years ago

You can use MergeBamAlignment https://github.com/broadinstitute/warp/blob/b4c11ec9a893b1e95064226f1ba7b17bfac044b9/tasks/broad/Alignment.wdl#L73