EBI-Metagenomics / emg-viral-pipeline

VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
Apache License 2.0
118 stars 16 forks source link

Update ViPhOGs to remove old models that are associated with discontinued viral taxa #104

Closed hoelzer closed 1 year ago

hoelzer commented 1 year ago

This needs to be done, @guille0387 detected which ViPhOGs belong to discontinued viral taxa such as the families Siphoviridae, Myoviridae, ...

Re-calculating the models is not so easy, but for now, we can simply remove these old models (which are not that many) from the ViPhOG database.

@guille0387 I think you can provide a list of which models need to be removed. And then, we can update the ViPhOG database file (currently vpHMM_database_v3.tar.gz and make v4?) and the pipeline accordingly?

We should then also update the data here: https://osf.io/fbrxy/ which is linked in the manuscript

hoelzer commented 1 year ago

We don't have to necessarily update the vpHMM_database_v3.tar.gz file, we can simply update the additional_data_vpHMMs_v3.tsv to a new v4. @guille0387 has that.

This then needs to be uploaded to the EBI FTP @mberacochea so that we can add and access it here:

https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/dev/nextflow/modules/metaGetDB.nf#L25

And then we switch to v4 of this metadata file in the config:

https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/dev/nextflow.config#L51

and done.

By that, we would then use the v3 of the HMMs but the v4 of the metadata file, removing outdated HMMs from the taxonomy assignment step.

mberacochea commented 1 year ago

Alright, that sounds like a plan. Should we put a warning message for the metadata v3?.

Ping me when the v4 metadata file is ready and I'll make the required changes.

hoelzer commented 1 year ago

Yes, good idea. @guille0387 we could put a warning message that v4 does not include the following discontinued virus taxa (according to ICTV) anyomore and then lost them:

Siphoviridae Podoviridae Myoviridae Caudovirales Allolevivirus Autographivirinae Buttersvirus Chungbukvirus Incheonvirus Leviviridae Levivirus Mandarivirus Pbi1virus Phicbkvirus Radnorvirus Sitaravirus Vidavervirus

(Pls double-check that list)

guille0387 commented 1 year ago

hey @mberacochea , I just sent you an email with the updated metadata file v4

hoelzer commented 1 year ago

Should be solved for now with the merge of PR https://github.com/EBI-Metagenomics/emg-viral-pipeline/pull/103