KennthShang / CHERRY

Host prediction for phages
GNU General Public License v3.0
21 stars 8 forks source link

A error in prediction of viral hosts with my own bacterial genomes #1

Closed Rainingfeeling closed 2 years ago

Rainingfeeling commented 2 years ago

Hi, Kennth I tried to use my own genomes to predict the viral host, but a error occured as follows: Command: python run_Speed_up.py --contigs all_viral_combined_MMseq_out2_rep_seq.fasta --mode prokaryote --t 0.98 --len 1500

Building a new DB, current time: 06/29/2022 18:44:58 New DB name: /home/jyzhang/softwares/CHERRY/new_blast_db/bin214 New DB title: new_prokaryote/bin214.fa Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 60 sequences in 0.14666 seconds. Running blastn... Traceback (most recent call last): File "edge_virus_prokaryote.py", line 151, in with open(blast_tab_out+file) as file_in: FileNotFoundError: [Errno 2] No such file or directory: 'blast_tab/bin117.tab' phage_host Error for file contig_0

I have already put my genomes in the "new_prokaryote/" folder, and added corresponding taxonomies in the dataset/prokaryote.csv file. When I used my own viral contigs to predict hosts, the above-mentioned error ocurred. I have tried to sovle this problem. I modified the line 151 of edge_virus_prokaryote.py, that is I changed "with open(blast_tab_out+file)" to "with open(new_blast_tab_out+file)". Then it worked. I wonder whether the modification is right or not. Besides, I did not find information about Crispr spacers of my own genomes in new_prokaryote/ folder in the result folder. Thus, I also wonder whether Cherry identify Crispr spacers of my own bacterial genomes in new_prokaryote/ folder, and whether Cherry will predict viral hosts according to the Crispr spacers of my own genomes. Look forward to your reply.

Jiayu Zhang

KennthShang commented 2 years ago

Hi,

Thanks for the debug, it should be correct.

BTW, Cherry will not identify crisprs from the users' genomes, it utilizes a well-defined crispr database for creating the edges. So, the crisprs used for prediction are not from your given bins. This is because we do not find a good method that can predict crisprs with high precision from raw genomes. But we would like to add such a function if you have any suggestions.

Thanks again.

Best, Jiayu

Rainingfeeling commented 2 years ago

Thank you for your reply. Do you mean that Cherry predicts viral hosts mainly by calculating the gene contents, sequence similarity, and k-mer frequency of viruses and bacterial genomes except for the well-defined crispr database when I used my own viral contigs and bacterial genomes.

KennthShang commented 2 years ago

It will use the well-defined crisprs database, but those crisprs are not captured from your provided bacterial genomes. Except for the situation that the database contains the crisprs that happen to be the ones from your genomes (but we do not know about that).

Rainingfeeling commented 2 years ago

OK, thank you

KennthShang commented 2 years ago

Hi, I suddenly noticed a problem with your usage.

You mentioned that you want to predict the host using your own genomes.

Then you should place your bins into the prokaryote folder rather than new_prokaryote and use virus mode. Then, the program will only output the host prediction of the viruses in your all_viral_combined_MMseq_out2_rep_seq.fasta file

However, If you use the prokaryote, the program will output the candidate viruses that infect your provided bins, which may contain other viruses in our database.

Best, Jiayu

Rainingfeeling commented 2 years ago

Thanks for your reminder. I will try it again.