Open abhi18av opened 1 year ago
@vrennie @TimHHH , where exactly do we need to add this sequencing platform information i.e. which summary files?
I would think a column in the summary stats file.
Yes, I agree with Tim, just a column that looks like this:
Sequencing Technology Illlumina ONT ONT Illumina Illumina Illumina ...
Okay, I understand this would be added to the summary stats file
👍
However, there's one more detail worth mentioning here, currently we hard-code the sequencing technology in the bam_rg_string
https://github.com/TORCH-Consortium/MAGMA/blob/786d13dfe1988784f870499bb878f47cc945a493/workflows/validate_fastqs_wf.nf#L30
Should we not add this column to the input-samplesheet as well?
Yes, good catch @abhi18av, lets add this as a column to the samplesheet.
Yes, ideally the user provides the sequencing technology in the sample sheet and this is then used in the bam_rg_string
along the lines of PL:${technology}
. The documentation has to be clear that only one technology is allowed per sample sheet.
Guys, what about reflecting that on the actual sample name as well? Something like Shea2017_2021_396.SRR16089406.LNA.A1.ILMN.1.1.1
The NCBI currently lists the following platforms used for the sequences
To avoid long names, we can perhaps standardize the acronyms like ILMN
/ ONT
/ PCB
/ ION
etc - what do you think?
I think unless the full name messes up the .csv its better to keep the full name
As part of 4-APR meeting.
@vrennie @TimHHH , where exactly do we need to add this sequencing platform information i.e. which summary files?