EBI-Metagenomics / emg-viral-pipeline

VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
Apache License 2.0
127 stars 16 forks source link

PPR-Meta scoring #35

Open hoelzer opened 4 years ago

hoelzer commented 4 years ago

Currently, all sequences are considered as viruses that are reported as "phages" by PPR-Meta. However, we can additionally filter by a "phage score" provided by the tool:

Header,Length,phage_score,chromosome_score,plasmid_score,Possible_source
seq8,86578,0.658026557109837,0.323770475766357,0.0182029599535136,phage
seq11,63443,0.671362450565434,0.257167359821571,0.0714701900259453,phage
seq20,41715,0.945974168353953,0.0147801588566125,0.0392456778921355,phage
seq22,38841,0.999412552439551,1.51951318980135e-05,0.000572250124745809,phage
awk 'BEGIN{FS=","}{if($6=="phage" && $3>0.7){print $0}}' 01-viruses/pprmeta/kleiner_virome_2015_pprmeta.csv 

This is also done here

hoelzer commented 4 years ago

I checked this for the Kleiner and Neto data set. Kleiner is not affected at all (because we combine PPRmeta results with VF results anyway) but some changes for Neto:

Screenshot from 2020-10-27 12-02-14 Screenshot from 2020-10-27 12-02-20

For Neto we reduce the number of unclassified contigs from 105 to 92. We also lose some Imitervirales annotations though.

hoelzer commented 2 years ago

I would still do this and implement a parameter for the PPR-Meta filtering and not just taking all hits that are phage into account. A good default seems >0.7