Closed miczuppi closed 2 years ago
Can you paste the entire output message and some data example you use?
Entire output message
----Calculating crispr feature values for combined.fasta ----
Traceback (most recent call last):
File "/mnt/projects/miniconda2/envs/virhostmatchernet/VirHostMatcher-Net/VirHostMatcher-Net.py", line 56, in <module>
predictor = HostPredictor(query_virus_dir, args.short_contig, intermediate_dir, genome_list, args.num_Threads)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/VirHostMatcher-Net/predictor.py", line 47, in __init__
self._crispr_signals = src.crispr.crispr_calculator(query_virus_dir, intermediate_dir, numThreads)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/VirHostMatcher-Net/src/crispr.py", line 82, in crispr_calculator
ind, df = crisprSingle(item, query_virus_dir, crispr_output_dir, numThreads)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/VirHostMatcher-Net/src/crispr.py", line 51, in crisprSingle
query_res = pd.read_table(output_file,header = None)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 683, in read_table
return _read(filepath_or_buffer, kwds)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 488, in _read
return parser.read(nrows)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1047, in read
index, columns, col_dict = self._engine.read(nrows)
File "/mnt/projects/miniconda2/envs/virhostmatchernet/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 223, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 230132, saw 4
I have been running VirHostMatcher-Net on a multifasta file:
DF12_D2_k141_1654_1 1-920/2415 CATTATAGGCTGAAAAATCTGTATTGCCCAGACTTGCATTTACCAGACCCGTCCAGACTTCCGAACCGGTTGTTTTTAACAAATAACATATGTAAGTATAAACTATATACACTTAAATTTAAAGTCCAAAAATTTTAATTTCACTGTATACATATATAAATTATCACTATAAAAAAACAAATCCGTATACAAAAAAGAGTATCACTGTTAGATAATTTAAAACAGTTTTACTCTTTTTTATATTTCAAAATATAATTTTTAATCTGATTTCTAAGTCTGTTCCATTTATCCGGGTTGTTCACATAATAAAGCGGGCAGAGTTTACCTGTAACATCATAATGTCTGATAATGTCAGAATTTTCAATTTGGTATTTCGCACACAACCATCCCGCAAGCTTTATAACAGAGTTGTACGTTTTTTTTGAAAATTTACCTGAACTGTCCGGATGACAGCATTCGATAGATATTGTGTCTGAATTTCTATTATTTGAAGCATATGAAATCTCATTTAACGGAATACATTGTATAATAGTTCCGTCCAATCCGATTATAAAATGACTGCTCACTTTATTTGCCGTATCATCAGATAAGTTTTTTCTGCTCTCAAAATAATTTCTGTTTGCCATCGCATCGGTTCCCGGATTTGCCGTATAATGTATAACTATACCTTTTATTTTTCTAAGCTTTATTCCCGGTCTTGAATTTTTATTTACAGTAAGCAATGCCTTTTTCACATTTGGTTTTGGAACGACATACTGTTCATAATTTATTCTGCTGCTCTTGGTGCCGATTTTTTTGGTAATTGCAGATTTTACAAGCATAAATACGACTGTCATTATAATGCATATAACAACAGCCGTACCCCACATTTTTAATACTTTCAATCTGCGGCGACGCTTCGCTTTTGAAAGTTTCTTCAT DF12_D2_k141_1703_1 1516-1958/1958 GGGTGCGCATCCGGGATGTGGTAGCGCCGGTGTTCTGGCCGGTGCACCGCGCCATTGCCCGCGGCACAGTTCAGGAACTGGTGGCCAAGGGCGGGCGCGGCAGCGGCAAATCCAGCTATATTTCCATTGAGCTTGTTTTGCAGCTGCTGCGCCACCCCGCCTGCCACGCGGTGGTGCTGCGCAAGATCGGCGGCACGCTGCGCACCAGTGTGTATGCGCAGATCCAGTGGGCCATTGGGGCGCTGGGGCTGGCAAAGCAGTTCCGCTGCACCGTCAGCCCCATGGAGTGTACTTATCTGCCCACAGGGCAGAAGATCCTCTTTTTTGGCACCGACGACCCCGGCAAGCTGAAAAGCATCAAGGTGCCATTTGGAGCCATCGGCCTGGCCTGGTTCGAGGAGCTGGACCAGTTCGACGGCCCCGAGGAGGTGCGCAACGTCGAG DF12_D2_k141_2905_1 1-1607/2573 ACTTTTGTAAGAGATGCACATCCCTTAAATGCATATTTTCCAACTTCCGTCACACTGTACGGAACGGATACATAGGTGATCTTTGTGTTTCCTCTCAGTGCCCCTTCTGCGATCGCCGTTACATTATACGTTATGCCGTTGATCTCCACCTGAGATGGGATGCTCACGGAAGTGATCTCCTCATCCAGCACGCCCGCGTAGGAAACTGCTTTGCTTCTGGTGCTTACAAGATACTTGCCGCCCGTTTTGCTGTCCATTAGAACCGTTCCGGCAGTCGGTGCCACATAGCTGACCGGCAGCATGGTGGTCTGTTTCAGATACGGATAACCGCCGTTTTCTGCTGCATTCAAAGCCCAGATGCCATCGAAATCAAAATCTTTGAAATAGCTCTGTGTCTTGATCTGCACATCATTGAGTGCCGTTGCTGTTCCGGTAATGCAGTTTCCTGTTTCAAATGCATAGACCGGATTCATGCTGTAATAGTAGCTGTTTGAAATGCTGCAGCCGGATGTCTGTGTGCTGCCTGCTGCCATTGTGGACGTGCCGACCATGCCGGATGCCACAGAATAATAAAACTGAATACCGACACATTTGTTGATCACGACCTGACCGTTCCCTGCTGTGATCTTCGCTACGATATTGGCCCCTTTTTCCAATACGCCTGCGTTGTAGCAGTTTGCGATCTCAATGTTTCTTGCCGCTGCCAGATTT
...
Thanks for the information! I suppose you did add '>' to all the headers like 'DF12_D2_k141_1654_1 1-920/2415' in the multifasta file? Otherwise, there should be an error early on. If so, please further check the following items:
I feel there might be a header formatting issue/bug in the input file, could you help locate the line that triggers the error: line 230132 in file $INTERMEDIATE_DIR/CRISPR/combined.crispr
, where $INTERMEDIATE_DIR
is what you specified in the option -i
. Likely line 230132 will have four fields and we will know what is wrong there.
Generally, we use VirHostMatcher-Net for a group of separate fasta files, because its final report will be corresponding to each fasta file rather than the contigs within it. You may get meaningful results by using combined.fasta only if you believe contigs in combined.fasta are from the same virus.
Thanks for the quick reply. All the header begins with ">". You were right, these are lines 230131-230133:
DF12_D2_k141_76125 2.640022|GCF_001698755.1| 0.40
DF12_D2_k141_76125 41DF17_D3_k141_93818||full 4.402867|GCF_007674265.1| 0.12
DF17_D3_k141_93818||full 92.1729172|GCF_000300715.1| 0.12
Line 230132 appears to present an extra field which have caused the error. I am currently running VirHostMatcher-Net on separate fasta files. Shall the same problem occur again, I will know how to solve it. Thank you very much for the quick and helpful support.
The *.crispr file is the direct output from blastn
, so I guess there might be some writing issue in blastn when dealing with large files (maybe due to multi-threading...) Anyway, feel free to reopen this thread if there is a new issue.
Hi, I have been getting this error
Could you help me fix this?