epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 3 forks source link

Xenome: remove merging of ambiguous and both fastqs #148

Closed syansanofi closed 4 years ago

syansanofi commented 4 years ago

Current After filtering, xenome produces 3 relevant files, (human, both, ambiguous). Currently these are merged into a single outgoing fastq. It was found for reads with high filtering / low graft composition, both and ambiguous reads produce inaccurate results. Approach Remove corresponding reads from these lines

cat [(${xenomeFields.convertHumanFastq1})] [(${xenomeFields.convertBothFastq1})] [(${xenomeFields.convertAmbiguousFastq1})] | gzip -c > [(${xenomeFields.humanMergedFastq1})][# th:if = "${xenomeFields.humanMergedFastq2 != null}"]
cat [(${xenomeFields.convertHumanFastq2})] [(${xenomeFields.convertBothFastq2})] [(${xenomeFields.convertAmbiguousFastq2})] | gzip -c > [(${xenomeFields.humanMergedFastq2})][/]

And also remove the awk fix lines for corresponding reads, since they are unnecessary.

awk '{if (NR % 4 == 1) print "@"$0; else if (NR % 4 == 3) print "+"$0; else print $0 }' [(${xenomeFields.bothFastq1})] > [(${xenomeFields.convertBothFastq1})]