Issue 1
Currently FONDA does not discriminate between lanes of a single sample. All lanes receive identical @RG ID: tags
Approach
Since alignment are done on a per lane basis for DNA based workflows (eg DNACapVar_Fastq), add lane number to read group. This would align more to standard practice (link)
Example
_samplemanifest.txt
parameterType
shortName
Parameter1
Parameter2
fastqFile
SampleA
SampleA_S1_L001_R1_001.fastq.gz
SampleA_S1_L001_R2_001.fastq.gz
fastqFile
SampleA
SampleA_S2_L002_R1_001.fastq.gz
SampleA_S2_L002_R2_001.fastq.gz
The @RG ID: tag would be:
parameterType
fastqFile
SampleA_L001
fastqFile
SampleA_L002
I would rather the lane numbers are iterated and appended onto the sample name:
SampleA+L001
rather than pulled out of the longest common substring of the sample's reads. This will make the lane numbering consecutive and easier to enforce because there will be no dependency on sample name prefixes.
Please let me know if this is clear.
Issue 2
All workflows should get the LB tag instead of only amplicon seq. Rationale follows previous, to align with current best practice.
Issue 1 Currently FONDA does not discriminate between lanes of a single sample. All lanes receive identical
@RG ID:
tagsApproach Since alignment are done on a per lane basis for DNA based workflows (eg DNACapVar_Fastq), add lane number to read group. This would align more to standard practice (link)
@RG ID:
tag would be:I would rather the lane numbers are iterated and appended onto the sample name:
SampleA+L001
rather than pulled out of the longest common substring of the sample's reads. This will make the lane numbering consecutive and easier to enforce because there will be no dependency on sample name prefixes.
Please let me know if this is clear.
Issue 2 All workflows should get the
LB
tag instead of only amplicon seq. Rationale follows previous, to align with current best practice.https://github.com/epam/fonda/blob/4a651caa0ab4bdb4ff92516d2294331c9723f134/src/main/java/com/epam/fonda/tools/impl/BwaSort.java#L108-L110
https://github.com/epam/fonda/blob/4a651caa0ab4bdb4ff92516d2294331c9723f134/src/main/java/com/epam/fonda/tools/impl/NovoalignSort.java#L117-L119
Approach Remove this check, use
@RG\\tID:%s\\tSM:%s\\tLB:%s\\tPL:Illumina
for all workflows.