Matthijnssenslab / ViPER

Bioinformatics pipeline used in the Laboratory of Viral Metagenomics (KU Leuven) to trim and assemble paired-end Illumina reads, and classify resulting contigs.
GNU General Public License v3.0
5 stars 4 forks source link

dev branch: geNomad finds proviruses after clustering #6

Closed nikolasbasler closed 7 months ago

nikolasbasler commented 10 months ago

See https://github.com/apcamargo/genomad/issues/31

A fix could be to run geNomad on all concatenated contigs before clustering. Then have the per-sample clustering script look up the geNomad output to cut host regions, instead of running geNomad for each sample individually.

A workaround might be to run the per-sample clustering script as it is right now on the concatenated contigs and process the output as usual in the cross-sample clustering script. But this is runtime and memory expensive.

Depending on input size and computing power, in might be possible to perform clustering in one step.

LanderDC commented 7 months ago

See #7.