280,000 done... finishing at Monday 8/23 at 8:28 AM
290,000 done... finishing at Monday 8/23 at 8:28 AM
Done writing /Users/mark/Drive/Sentinel/sequencing/gisaid/210818/msa_0818_updates.fa.xz
Finding mutations and writing to /Users/mark/Drive/Sentinel/sequencing/gisaid/210818/msa_0818_updates_muts.tsv.xz
Loading alignment file at /Users/mark/Drive/Sentinel/sequencing/gisaid/210818/msa_0818_updates.fa.xz
Identifying insertions...
Traceback (most recent call last):
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'codon_num'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3746, in _set_item_mgr
loc = self._info_axis.get_loc(key)
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'codon_num'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/WebServer/Documents/sabeti/sequencing/ncbi/gisaid_msa_update.py", line 362, in <module>
find_mutations()
File "/Library/WebServer/Documents/sabeti/sequencing/ncbi/gisaid_msa_update.py", line 310, in find_mutations
run_bjorn(new_msa_path, new_muts_path)
File "/Library/WebServer/Documents/sabeti/sequencing/ncbi/gisaid_msa_update.py", line 298, in run_bjorn
process_mutations(input_path, GISAID_REF, output_path)
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/src/msa_2_mutations_2.py", line 42, in process_mutations
inserts, _ = bm.identify_insertions_per_sample(msa_data,
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/src/mutations.py", line 426, in identify_insertions_per_sample
seqsdf['codon_num'] = seqsdf.apply(compute_codon_num, args=(gene2pos,), axis=1)
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3599, in __setitem__
self._set_item_frame_value(key, value)
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3737, in _set_item_frame_value
self._set_item_mgr(key, arraylike)
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3749, in _set_item_mgr
self._mgr.insert(len(self._info_axis), key, value)
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1158, in insert
block = new_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1922, in new_block
check_ndim(values, placement, ndim)
File "/Library/WebServer/Documents/sabeti/sequencing/bjorn/venv/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1964, in check_ndim
raise ValueError(
ValueError: Wrong number of items passed 8, placement implies 1
After a bit of poking around, it looks like seqsdf has zero rows. There seems to be a check in place with a similar intent, namely if identify_insertion_positions returns a falsey value. But that was not catching this. In order to prevent the exception, adding another check for a dataframe with at least 1 row seems to solve it.
Got this stack trace while running an update:
After a bit of poking around, it looks like
seqsdf
has zero rows. There seems to be a check in place with a similar intent, namely ifidentify_insertion_positions
returns a falsey value. But that was not catching this. In order to prevent the exception, adding another check for a dataframe with at least 1 row seems to solve it.