Bioinformatics pipeline used in the Laboratory of Viral Metagenomics (KU Leuven) to trim and assemble paired-end Illumina reads, and classify resulting contigs.
GNU General Public License v3.0
5
stars
4
forks
source link
dev branch: geNomad finds proviruses after clustering #6
A fix could be to run geNomad on all concatenated contigs before clustering. Then have the per-sample clustering script look up the geNomad output to cut host regions, instead of running geNomad for each sample individually.
A workaround might be to run the per-sample clustering script as it is right now on the concatenated contigs and process the output as usual in the cross-sample clustering script. But this is runtime and memory expensive.
Depending on input size and computing power, in might be possible to perform clustering in one step.
See https://github.com/apcamargo/genomad/issues/31
A fix could be to run geNomad on all concatenated contigs before clustering. Then have the per-sample clustering script look up the geNomad output to cut host regions, instead of running geNomad for each sample individually.
A workaround might be to run the per-sample clustering script as it is right now on the concatenated contigs and process the output as usual in the cross-sample clustering script. But this is runtime and memory expensive.
Depending on input size and computing power, in might be possible to perform clustering in one step.