WeiliWw / VirHostMatcher-Net

VirHostMatcher-Net: A network-based computational tool for predicting virus-host interactions.
19 stars 1 forks source link

Valid Fasta File Error? #4

Closed morgvevans closed 3 years ago

morgvevans commented 4 years ago

Hello! I am trying to use this tool & running into a weird error - here's the log file

Loading packages... Intermediate results will be stored in /fs/project/PAS1331/cyanobacteria/VirHostMatch/VirHostMatcherNet_Out/intermediate ----Calculation of s2 is split into three parts---- ----Start calculating s2 part I... ---- The query file 3832.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 6604.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 11677.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 1461.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 8506.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 6215.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 7910.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 7047.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 6630.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 11674.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 12750.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 10969.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 9434.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 7373.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 8355.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 4207.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 10563.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 11651.fasta contains invalid chars, please make sure it is a valid fasta file. The query file 5511.fasta contains invalid chars, please make sure it is a valid fasta file. Program terminated. Please check error info above.

I have tried looking at the fasta files, I removed any weird characters in the title of the fasta files (e.g. spaces, =, etc) but that didn't seem to make a difference.

Here is my command conda activate virhostmatcher cd /fs/project/PAS1331/VirHostMatcher-Net

python VirHostMatcher-Net.py -q /fs/project/PAS1331/cyanobacteria/VirHostMatch/Viruses/ -t 48 -l /fs/project/PAS1331/cyanobacteria/VirHostMatch/Host_GenBank_list.txt -i /fs/project/PAS1331/cyanobacteria/VirHostMatch/VirHostMatcherNet_Out/intermediate/ -o /fs/project/PAS1331/cyanobacteria/VirHostMatch/VirHostMatcherNet_Out -n 5

Also I have tried without and with python at the beginning & same result. I have also attached one of my fasta files for your reference. I am pretty sure these are valid because I have used them in a ton of other analyses with no issues, but maybe I am missing something? (I also tried renaming the files to have 'viral_ID.fasta' but no luck)

1064.zip

THANK YOU!

WeiliWw commented 4 years ago

Hi,

Thanks for your interest in our tool! The error is due to the invalid chars in the fasta file - it is not related to the file name or headers in the fasta file (e.g. lines starting with '>'). We enforced a sanity check to make sure all nucleotides in each file to be one of the chars "atcgwsmkrybdhvn", either lower or upper case.

It seems like the file 1064.fasta you attached works fine in the program. Can you examine one of the files that report the error?

morgvevans commented 4 years ago

Hi, I was able to resolve this issue but now I am having another problem. Here is my command python VirHostMatcher-Net.py -q /fs/project/PAS1331/virome/ASSEMBLY/ALL_READS_VIBRANTOUT/individual_contigs/ --short-contig -o output_06262020 -n 3 -t 8

I get the following output

Loading packages... Intermediate results will be stored in /fs/project/PAS1331/VirHostMatcher-Net/intermediate_res ----Calculation of s2 is split into two parts---- ----Start calculating s2 part I... ---- ----Finished calculating s2 part I---- ----Start calculating s2 part II... ---- ----Finished calculating s2* part II---- ----Start calculating network neighborhood feature values...---- ----Finished Calculating network neighborhood feature values---- ----Fitting models in WIsH...---- ----WIsH calculation finished.---- ----Calculating crispr feature values for >k141_612785_length_6894_cov_251.1451.fasta ---- Traceback (most recent call last): File "VirHostMatcher-Net.py", line 46, in predictor = HostPredictor(query_virus_dir, args.short_contig, intermediate_dir, genome_list, args.num_Threads[0]) File "/fs/project/PAS1331/VirHostMatcher-Net/predictor.py", line 47, in init self._crispr_signals = src.crispr.crispr_calculator(query_virus_dir, intermediate_dir, numThreads) File "/fs/project/PAS1331/VirHostMatcher-Net/src/crispr.py", line 81, in crispr_calculator ind, df = crisprSingle(item, query_virus_dir, crispr_output_dir, numThreads) File "/fs/project/PAS1331/VirHostMatcher-Net/src/crispr.py", line 43, in crisprSingle crispr_call() File "/users/PAS1331/osu7930/miniconda3/envs/virhostmatcher/lib/python3.6/site-packages/Bio/Application/init.py", line 531, in call stdout_str, stderr_str) Bio.Application.ApplicationError: Non-zero return code 1 from 'blastn -out /fs/project/PAS1331/VirHostMatcher-Net/intermediate_res/CRISPR/>k141_612785_length_6894_cov_251.crispr -outfmt "\'6 qacc sacc evalue\'" -query /fs/project/PAS1331/virome/ASSEMBLY/ALL_READS_VIBRANTOUT/individual_contigs/>k141_612785_length_6894_cov_251.1451.fasta -db /fs/project/PAS1331/VirHostMatcher-Net/data/crispr_db_prefix/allCRISPRs -evalue 1 -word_size 7 -num_threads 8 -gapopen 10 -gapextend 2 -task blastn-short -penalty -1 -dust no -perc_identity 90', message "BLAST query/options error: ''6' is not a valid output format"


Thank you!

WeiliWw commented 4 years ago

Hi, It is related to a version issue in Biopython. This Git repo has recently been updated to fix the issue. Please update your local VirHostMatcher-Net and re-try.