Samples processed in different lanes

shashwatsahay commented 6 months ago

Hey

I am quite new to UMI based sequencing

I am wondering generally when we get sequencing data from different lanes we align first and then merge later does this hold true for UMI based sequencing also, i.e. should I run GroupReadsByUmi, CallMolecularConsensusReads and FilterConsensusReads for individual lanes or should it be run for all together??

nh13 commented 6 months ago

@shashwatsahay when you have the same library split across multiple lanes, you should merge them prior to grouping and consensus calling, as reads from the same source molecule may be split across lanes. Does that make sense?

shashwatsahay commented 6 months ago

Yes it does!!!

Thanks...

Sorry but I have another question, I have the same error as mentioned in the

https://github.com/fulcrumgenomics/fgbio/issues/320

I was ran the pipeline mentioned in the docs and after that wanted to run Picard markduplicates tool which threw an error

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
@RG ID:A; File /alignments/S7.cons.filtered.realigned.bam; Line number 89

My current pipeline goes something like this

BWA mem -T 0
AnnotateBamWithUmis
SetMateInformation
GroupReadsByUmi
CallMolecularConsensusReads
FilterConsensusReads
BWA mem realign
Picard MarkDuplicates

As far as I notice the tag @RG gets added at CallMolecularConsensusReads

could you let me know what can be done here

nh13 commented 6 months ago

You can use FastqToBam to import your FASTQ to BAM, such that it contains sample in the read group. See: https://github.com/fulcrumgenomics/fgbio/blob/main/docs/best-practice-consensus-pipeline.md

shashwatsahay commented 6 months ago

Thanks, it worked

fulcrumgenomics / fgbio

Samples processed in different lanes #969