EBI-Metagenomics / emg-viral-pipeline

VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
Apache License 2.0
119 stars 16 forks source link

Issue to run stable version v0.2.0 #79

Closed gwallau closed 1 year ago

gwallau commented 1 year ago

Hello folks,

I'm getting this error while trying to run VIRfy.

Preview nextflow mode 'preview' is not supported anymore -- Please use nextflow.enable.dsl=2 instead.

I'm running the command below using N E X T F L O W ~ version 22.04.4

nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.2.0 --help

Obs. This command works well - nextflow run EBI-Metagenomics/emg-viral-pipeline -r master --help

Best regards, Gabriel Wallau

hoelzer commented 1 year ago

Hey @gwallau ! Thanks for your interest in the pipeline! Yes, the v0.2.0 is unfortunately incompatible with newer nextflow versions where DSL2 (domain specific language v2) is not a preview feature anymore. Please run a newer release version (e.g. v0.4.0). That should work the same way like running from the master branch.

Btw: we are also working on a new release version after we cleaned up some open issues.

I will close this issue, but please feel free to open again if there is something else related to your question.

gwallau commented 1 year ago

Thanks @hoelzer! I'm already running v0.4.0.

The pipeline is really interesting, although I would like to use it for eukaryotic viruses taxonomic assignments and I saw (preprint) that you will be working to improve the accuracy for these viruses. Anyway, do you have any parameter tips for eukaryotic viruses?

Besides, are there any verbose mode where intermediate files (virfinder, virsorted) would be printed?

Best, Gabriel

hoelzer commented 1 year ago

Hey!

ad eukaryotic viruses: yes, ~50% of the virus-specific HMM models for the taxonomic assignment are targeting viruses. However, we also saw in our tests that the models could be improved further and, e.g., also updated by newly available sequences. While this is for sure on the agenda, there is no ETA for this, I think. I don't have any specific tips for the parameter tuning for euk viruses, but two parameters you could play w/ are for sure:

    prop = 0.1
    taxthres = 0.6

(https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/nextflow.config#L41)

By default, we want to have at least 10% of the annotated CDS per contig annotated by informative ViPhOGs, and we want that 60% of those support the same taxonomy assignment.

Other than that, I think it's really about the HMM models themself that we need to improve further and update.

Besides, are there any verbose mode where intermediate files (virfinder, virsorted) would be printed?

Yes, you can find all the intermediate data files in the nextflow working directories. Per default, when running the pipeline you will get a folder called work (can be changed via -w some/other/path) and here you will find subdirectories. When running the pipeline, check for the prefix written before a process, e.g. smt like

a4/389rw3f process(virsorter)

and then look for the content in the directory work/a4/389rw3f*/. Here you find any intermediate data.

Besides, we also publish some of the output files of VirSorter, VirFinder, ... in the results folder but not all.

gwallau commented 1 year ago

Perfect, thanks for all info!