Hej,
I am trying to analyse a fastq file with a custom gpkg but get an error for a specific read, that seems to be split somehow and then later not found in the original data or something like that i guess. Do you have a suggestion as to what is happening there?
I also attach the sequence, the gpkg and the output so you can reproduce the issue
Traceback (most recent call last):
File "/usr/local/bin/graftM", line 409, in <module>
Traceback (most recent call last):
File "/usr/local/bin/graftM", line 409, in <module>
Run(args).main()
Run(args).main()
File "/usr/local/lib/python2.7/dist-packages/graftm/run.py", line 588, in main
File "/usr/local/lib/python2.7/dist-packages/graftm/run.py", line 588, in main
self.graft()
File "/usr/local/lib/python2.7/dist-packages/graftm/run.py", line 377, in graft
self.graft()
File "/usr/local/lib/python2.7/dist-packages/graftm/run.py", line 377, in graft
diamond_db
File "/usr/local/lib/python2.7/dist-packages/graftm/timeit.py", line 10, in timed
diamond_db
File "/usr/local/lib/python2.7/dist-packages/graftm/timeit.py", line 10, in timed
result = method(*args, **kw)
File "/usr/local/lib/python2.7/dist-packages/graftm/sequence_searcher.py", line 822, in aa_db_search
result = method(*args, **kw)
File "/usr/local/lib/python2.7/dist-packages/graftm/sequence_searcher.py", line 822, in aa_db_search
hit_reads_orfs_fasta)
hit_reads_orfs_fasta)
File "/usr/local/lib/python2.7/dist-packages/graftm/sequence_searcher.py", line 951, in search_and_extract_orfs_matching_protein_database
File "/usr/local/lib/python2.7/dist-packages/graftm/sequence_searcher.py", line 951, in search_and_extract_orfs_matching_protein_database
SequenceSearchResult.QUERY_TO_FIELD])
SequenceSearchResult.QUERY_TO_FIELD])
File "/usr/local/lib/python2.7/dist-packages/graftm/sequence_searcher.py", line 608, in _extract_orfs
File "/usr/local/lib/python2.7/dist-packages/graftm/sequence_searcher.py", line 608, in _extract_orfs
entry=sequence_frame_info_dict[record.id]
entry=sequence_frame_info_dict[record.id]
KeyErrorKeyError: 'SRR3656745.9234541.2_split_1'
: 'SRR3656745.9234541.2_split_1'
Hej, I am trying to analyse a fastq file with a custom gpkg but get an error for a specific read, that seems to be split somehow and then later not found in the original data or something like that i guess. Do you have a suggestion as to what is happening there? I also attach the sequence, the gpkg and the output so you can reproduce the issue
graftM_error_report.tar.gz