andynet / pheri

Prediction of the phage hosts
MIT License
1 stars 1 forks source link

Interpreting the PHERI results #1

Closed ShailNair closed 3 years ago

ShailNair commented 3 years ago

Hi

Thanks for the PHERI tool. I tested it on my phage genome. but confused about the output results. My sample.res.tsv file looks like this:

sample_id infects host score
NODE_1_length_44247_cov_7769 True Pseudomonas 1.0
NODE_1_length_44247_cov_7769 False Xanthomonas 1.0
NODE_1_length_44247_cov_7769 False Synechococcus 1.0
NODE_1_length_44247_cov_7769 False Stenotrophomonas 1.0
NODE_1_length_44247_cov_7769 False Ruegeria 1.0
NODE_1_length_44247_cov_7769 False Rhizobium 1.0
NODE_1_length_44247_cov_7769 False Cutibacterium 1.0
NODE_1_length_44247_cov_7769 False Corynebacterium 1.0
NODE_1_length_44247_cov_7769 False Clostridioides 1.0
NODE_1_length_44247_cov_7769 False Caulobacter 1.0
NODE_1_length_44247_cov_7769 False Brucella 1.0
NODE_1_length_44247_cov_7769 False Aeromonas 1.0
NODE_1_length_44247_cov_7769 False Yersinia 0.9998
NODE_1_length_44247_cov_7769 False Staphylococcus 0.9998

What does infect mean? Whats TRUE/FALSE here

Similarly, how the score is assigned. Lie the first host has infect =TRUE with a score =1.0. But for the second host, the infect =FALSE, and the score remains the same

Can we add a custom host genome to check if the phage can infect it?

andynet commented 3 years ago

Hi @razz1618,

your results suggest the NODE_1 is from a phage which can infect only Pseudomonas. All others genera is False, so the tool thinks your sequence should not infect other genera.

The score column is a measure of how confident (on a scale from 0 to 1) the model is in its prediction, so in your case it is very confident that the sequence is infecting Pseudomonas and also it is very confident it is not infecting Xanthomonas, Synechococcus, Stenotrophomonas...

Currently, it is not possible to add a custom host genome, but the models are trained on quite exhaustive data set (n=7064) of annotated phages from NCBI and most of the genera with reasonable sample sizes are included.