BioMedicalBigDataMiningLab / PHIAF

5 stars 1 forks source link

How to predict new virus sequences using PHIAF #1

Open KennthShang opened 2 years ago

KennthShang commented 2 years ago

HI,

Thanks for providing PHIAF.

However, It seems your tool cannot predict the hosts for new viruses? Maybe a more detailed guideline should be provided for the user who wants to use PHIAF?

Best, Jiayu

mengluli-web commented 2 years ago

Hi,

In the readme, we briefly introduce this process: For new host/phage, you can update the data/data_pos_neg.txt, and download the DNA and protein sequences from the NCBI database. Using code/compute_dna_features.py and code/compute_protein_features.py to compute the features derived from DNA and protein sequences. Then save the obtained feature files in the data/XXX_features. Finally, you can change the input (training and test data) in main.py to predict the new phage/host (you should input the training data into generate_data.py to augment the dataset before running the main.py).

Best, Menglu Li

KennthShang commented 2 years ago

However, we want to predict phages without known information about them. We cannot update your file with any labels. Is this possible to use your model to predict on sequence directly?