awfderry / COLLAPSE

Representation learning for protein functional site analysis
MIT License
8 stars 2 forks source link

KeyError: '7xas' #12

Open BinhongLiu opened 1 year ago

BinhongLiu commented 1 year ago

Hi I tested with my own .pdb file today, but a similar error appeared, which seems to be caused by the mapping problem between the database and the pdb_metadata.csv file.

The code1: python search_site.py bai/4is3.pdb B K161 data/datasets/pdb_embeddings.pkl --cutoff 1e-3 --verbose --num_iter 3

Then I chose a different central residue. Still, a similar but different error appeared. The code2: python search_site.py bai/4is3.pdb B G94 data/datasets/pdb_embeddings.pkl --cutoff 1e-3 --verbose --num_iter 3

Both 7xas and 5esy that caused the two similar errors are indeed not found in the pdb_metadata.csv file.

In the meantime, I test the script using the demo data again, and no error appeared. python search_site.py data/examples/1a0h.pdb B H363 data/datasets/pdb_embeddings.pkl --cutoff 1e-3 --verbose --num_iter 3

I'm a little confused why this error will happen when I just used a different .pdb structure or central residue.

The log file of code1:

search_site.py:43: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. pdb_meta = pdb_meta.append(pd.Series(data=['N/A'] * pdb_meta.shape[1], index=pdb_meta.columns, name=query_pdb)) ['103l_A' '103l_A' '103l_A' '103l_A' '103l_A'] 59445 Database size: 873863 Iteration 1: 458 new results Iteration 2: 1023 new results Iteration 3: 41687 new results Traceback (most recent call last): File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '7xas'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "search_site.py", line 126, in main(args) File "search_site.py", line 99, in main results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols]) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/series.py", line 4771, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1123, in apply return self.apply_standard() File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1174, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer File "search_site.py", line 99, in results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols]) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1067, in getitem return self._getitem_tuple(key) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1247, in _getitem_tuple return self._getitem_lowerdim(tup) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 967, in _getitem_lowerdim section = self._getitem_axis(key, axis=i) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1312, in _getitem_axis return self._get_label(key, axis=axis) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1260, in _get_label return self.obj.xs(label, axis=axis) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/generic.py", line 4056, in xs loc = index.get_loc(key) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err KeyError: '7xas'

The log file of code2: search_site.py:43: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. pdb_meta = pdb_meta.append(pd.Series(data=['N/A'] * pdb_meta.shape[1], index=pdb_meta.columns, name=query_pdb)) ['103l_A' '103l_A' '103l_A' '103l_A' '103l_A'] 59445 Database size: 1100118 Iteration 1: 435 new results Iteration 2: 2979 new results Iteration 3: 51569 new results Traceback (most recent call last): File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: '5esy'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "search_site.py", line 126, in main(args) File "search_site.py", line 99, in main results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols]) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/series.py", line 4771, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1123, in apply return self.apply_standard() File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1174, in apply_standard mapped = lib.map_infer( File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer File "search_site.py", line 99, in results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols]) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1067, in getitem return self._getitem_tuple(key) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1247, in _getitem_tuple return self._getitem_lowerdim(tup) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 967, in _getitem_lowerdim section = self._getitem_axis(key, axis=i) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1312, in _getitem_axis return self._get_label(key, axis=axis) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1260, in _get_label return self.obj.xs(label, axis=axis) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/generic.py", line 4056, in xs loc = index.get_loc(key) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err KeyError: '5esy'

BinhongLiu commented 1 year ago

It seems to be that not every residue could be chosen as the central residue, right? I'm sorry I'm not good at this field.

awfderry commented 1 year ago

Hi @BinhongLiu, it seems like this error appears because there are some deprecated PDBs (such as 7xas) that are in the embedding database but not the PDB metadata. This issue has been fixed so that PDB IDs not in the metadata don't cause an error.

awfderry commented 1 year ago

Also, it seems like in these examples you are getting a very large number of results (>40000) by iteration 3, which is likely very slow to run and will result in low specificity. Unless this is what you're looking for, I would suggest generally running with a higher cutoff (e.g. 1e-4) or fewer iterations to improve performance.