epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 3 forks source link

Novoalign command reset #204

Closed syansanofi closed 3 years ago

syansanofi commented 3 years ago

Issue
Novoalign version 4 has new tune features that are sets of parameters specifically tested against data from individual sequencers such as Novaseq, Hiseq...etc. These features are available in --tune. Defaults without tuning has been updated as well.

Approach
Remove all aligner parameters and replace with tune. The tune parameter is optional so by default, command should have no --tune.

https://github.com/epam/fonda/blob/9322c5f7ddf34e7c4068b788c52c8599337f4202/src/main/resources/templates/novoalign_sort_tool_template.txt#L2

should become something similar to the following:

[(${novoalignSortFields.novoalign})] -c [(${novoalignSortFields.numThreads})] -d [(${novoalignSortFields.novoindex})] -o SAM $[(${novoalignSortFields.rg})] -f [# th:if = "${fastq2 != null}"][(${fastq1})] [(${fastq2})][/][# th:unless = "${fastq2 != null}"][(${fastq1})][/] [# th:if = "${novoalignSortFields.tune != null}"]--tune [(${novoalignSortFields.tune})] [/]| [(${novoalignSortFields.samtools})] view -bS -|[(${novoalignSortFields.samtools})] sort - [(${novoalignSortFields.tmpBam})]

For integration purposes, here are the --tune options and what other parameters are set to for each one. Column headers denote the option string.

Default V3-Defaults HiSeqX HiSeq NextSeq NOVASEQ BGISEQ500 MGISEQ2000 IONTorrent-1 IONTorrent-2
-g 40 40 40 40 40 40 40 40 95 100
-x 2 6 20 1 1 1 1 1 2 3
--matchreward 4 4 4 4 4 4 4 2 2
--softclip 50,30 0,0 50,30 45,30 50,30 45,30 45,30 50,30 100,50 100,25
-H 5 2 7 17 22 17 17 12 5 5
-t 0,3 16,4.5 0,2.0 0,2.5 0,2.0 0,2.5 0,3.5 0,1.5 0,3.0 0,3.0
--hlimit 8 off 9 9 8 8 8 8 7 7
-v 150 70 150 150 150 150 150 150 150 150
-r Random None Random Random Random Random Random Random Random Random
--pechimera on off on on on on on on
-u 8 0 8 8 8 8 8 8 8 8
-o SAM Native
--tag LB Off On
--trim3hp AG
-k -k -k

Source: http://www.novocraft.com/downloads/V4.00.Pre-20190805/NovoalignVersion4.pdf

kamyshova commented 3 years ago

@syansanofi Hi Shu, Do I understand correctly that the --amplicons argument should also be removed from the template along with other parameters?

syansanofi commented 3 years ago

@syansanofi Hi Shu, Do I understand correctly that the --amplicons argument should also be removed from the template along with other parameters?

Yes! We can remove it