Currently, we are using the default model for VirFinder predictions.
However, we are particularly interested in predicting also eukaryotic viruses (and not only phages) with VirFinder. I tested the prediction using a specific model and implemented this in the nextflow version of the pipeline:
https://github.com/hoelzer/virify/issues/21
Basically, the model needs to be downloaded (or deposited somewhere):
I just introduced the awk filter because the resulting txt file has additional columns in comparison to what the pipeline is currently expecting in the next parse step and to avoid any problems here.
I think what needs to be done is:
clean the R script
use it instead of the current one in the CWL pipeline
the CWL is currently using a parallelized version of VirFinder, however, VirSorter is much slower than VirFinder so the parallelization is at the moment not really a speed-up because the parsing process waits for output of both tools
Currently, we are using the default model for VirFinder predictions.
However, we are particularly interested in predicting also eukaryotic viruses (and not only phages) with VirFinder. I tested the prediction using a specific model and implemented this in the nextflow version of the pipeline: https://github.com/hoelzer/virify/issues/21
Basically, the model needs to be downloaded (or deposited somewhere):
and then I am using a simplified version of a script from Guillermo:
The script can be found here: https://github.com/hoelzer/virify/tree/master/bin
I just introduced the
awk
filter because the resulting txt file has additional columns in comparison to what the pipeline is currently expecting in the next parse step and to avoid any problems here.I think what needs to be done is: