dev branch: geNomad finds proviruses after clustering

See https://github.com/apcamargo/genomad/issues/31

A fix could be to run geNomad on all concatenated contigs before clustering. Then have the per-sample clustering script look up the geNomad output to cut host regions, instead of running geNomad for each sample individually.

A workaround might be to run the per-sample clustering script as it is right now on the concatenated contigs and process the output as usual in the cross-sample clustering script. But this is runtime and memory expensive.

Depending on input size and computing power, in might be possible to perform clustering in one step.

Matthijnssenslab / ViPER

dev branch: geNomad finds proviruses after clustering #6