VirFinder w/ a more specific model for eukaryotic viruses

Currently, we are using the default model for VirFinder predictions.

However, we are particularly interested in predicting also eukaryotic viruses (and not only phages) with VirFinder. I tested the prediction using a specific model and implemented this in the nextflow version of the pipeline: https://github.com/hoelzer/virify/issues/21

Basically, the model needs to be downloaded (or deposited somewhere):

wget https://github.com/jessieren/VirFinder/raw/master/EPV/VF.modEPV_k8.rda

and then I am using a simplified version of a script from Guillermo:

run_virfinder_modEPV.Rscript VF.modEPV_k8.rda ${fasta} .
awk '{print $1"\t"$2"\t"$3"\t"$4}' ${name}*.txt > ${name}.txt

The script can be found here: https://github.com/hoelzer/virify/tree/master/bin

I just introduced the awk filter because the resulting txt file has additional columns in comparison to what the pipeline is currently expecting in the next parse step and to avoid any problems here.

I think what needs to be done is:

clean the R script
use it instead of the current one in the CWL pipeline
- the CWL is currently using a parallelized version of VirFinder, however, VirSorter is much slower than VirFinder so the parallelization is at the moment not really a speed-up because the parsing process waits for output of both tools

EBI-Metagenomics / emg-viral-pipeline

VirFinder w/ a more specific model for eukaryotic viruses #4