CRG-CNAG / CalliNGS-NF

GATK RNA-Seq Variant Calling in Nextflow
Mozilla Public License 2.0
130 stars 53 forks source link

The on-the-fly two-pass option could be used to avoid the genome regeneration step #26

Closed kojix2 closed 2 years ago

kojix2 commented 2 years ago

According to the STAR manual...

https://raw.githubusercontent.com/alexdobin/STAR/master/doc/STARmanual.pdf

8.3 2-pass mapping with re-generated genome.

This is the original 2-pass method which involves genome re-generation step in-between 1st and 2nd passes. Since 2.4.1a, it is recommended to use the on the fly 2-pass options as described above.

It seems to say that genome regeneration is not recommended.

8.1 Multi-sample 2-pass mapping. For a study with multiple samples, it is recommended to collect 1st pass junctions from all samples.

  1. Run 1st mapping pass for all samples with "usual" parameters. Using annotations is recommended either a the genome generation step, or mapping step.
  2. Run 2nd mapping pass for all samples , listing SJ.out.tab files from all samples in --sjdbFileChrStartEnd /path/to/sj1.tab /path/to/sj2.tab ....

Honestly, I am not sure what 2-pass mapping is, but maybe the following script can be improved by omitting the genome re-generation.

https://github.com/CRG-CNAG/CalliNGS-NF/blob/649270265eec29796816810b4ea5103d6ca9aada/modules.nf#L113-L142

pditommaso commented 2 years ago

I leave this to @lucacozzuto

lucacozzuto commented 2 years ago

Hi, the original idea was to generate an index based on the annotation, align the reads and discover new splicing sites. They will be then used to generate another (improved) index. Finally you'll use this index for aligning the reads. I think is ok to change the code since now everything can be done in a single step.

kojix2 commented 2 years ago

Thanks!