Closed shrimonmuke0202 closed 2 years ago
I believe this is related to #41. Will have a fix this week.
@shrimonmuke0202 Hiya, sorry for the delay. This should be fixed now. Ping me if you run into any problems!
After reinstalling graphein in the source code of graphein.protein.graphs, the line `if granularity == "centroids": atoms = convert_structure_to_centroids(atoms)
# atoms = atoms` present
`ValueError Traceback (most recent call last)
/tmp/ipykernel_1933/1307404750.py in
~/.local/lib/python3.8/site-packages/graphein/protein/graphs.py in construct_graph(config, pdb_path, pdb_code, chain_selection, df_processing_funcs, edge_construction_funcs, edge_annotation_funcs, node_annotation_funcs, graph_annotation_funcs) 611 612 # Compute graph edges --> 613 g = compute_edges( 614 g, 615 funcs=config.edge_construction_functions,
~/.local/lib/python3.8/site-packages/graphein/protein/graphs.py in compute_edges(G, funcs, get_contacts_config) 505 506 for func in funcs: --> 507 func(G) 508 509 return G
~/.local/lib/python3.8/site-packages/graphein/protein/edges/atomic.py in add_atomic_edges(G) 86 """ 87 TOLERANCE = 0.56 # 0.4 0.45, 0.56 This is the distance tolerance ---> 88 dist_mat = compute_distmat(G.graph["pdb_df"]) 89 90 # We assign bond states to the dataframe, and then map these to covalent radii
~/.local/lib/python3.8/site-packages/graphein/protein/edges/distance.py in compute_distmat(pdb_df) 55 ) 56 eucl_dists = pd.DataFrame(squareform(eucl_dists)) ---> 57 eucl_dists.index = pdb_df.index 58 eucl_dists.columns = pdb_df.index 59
~/.local/lib/python3.8/site-packages/pandas/core/generic.py in setattr(self, name, value) 5498 try: 5499 object.getattribute(self, name) -> 5500 return object.setattr(self, name, value) 5501 except AttributeError: 5502 pass
~/.local/lib/python3.8/site-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.set()
~/.local/lib/python3.8/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels) 764 def _set_axis(self, axis: int, labels: Index) -> None: 765 labels = ensure_index(labels) --> 766 self._mgr.set_axis(axis, labels) 767 self._clear_item_cache() 768
~/.local/lib/python3.8/site-packages/pandas/core/internals/managers.py in set_axis(self, axis, new_labels) 214 def set_axis(self, axis: int, new_labels: Index) -> None: 215 # Caller is responsible for ensuring we have an Index object. --> 216 self._validate_set_axis(axis, new_labels) 217 self.axes[axis] = new_labels 218
~/.local/lib/python3.8/site-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels) 55 56 elif new_len != old_len: ---> 57 raise ValueError( 58 f"Length mismatch: Expected axis has {old_len} elements, new " 59 f"values have {new_len} elements"
ValueError: Length mismatch: Expected axis has 1 elements, new values have 0 elements` this error still present after reinstalling graphein
Hi @shrimonmuke0202, I've made the changes available in the version on PyPI. Could you try reinstalling via pip and see if that helps?
This still throws the same error, especially for Disulphide bonds. Look at Uniprot id Q15843 for instance. Any insights to the reason for this error? @a-r-j
Hiya @diamondspark, I'm struggling to reproduce the error (https://gist.github.com/a-r-j/5a5bfae2dd12dced5328fcc8f1caa52e).
Could you share a minimal example & indicate which PDB you're using for Q15843
- I just grabbed one from UniProt. Keen to get this fixed so any help is much appreciated!
Have you checked if you're using Graphein 1.0.9
? You may have to use the --no-cache-dir
flag eg:
pip install graphein==1.0.9 --no-cache-dir
@a-r-j Thank you for the prompt response. I'm using 1.0.0, will try out 1.0.9 Here's a sample code that breaks
protein_path = download_alphafold_structure('Q15843')[0]
print(protein_path)
g = construct_graph(pdb_path=protein_path)
new_edge_funcs = {"edge_construction_functions": [add_peptide_bonds,
add_hydrogen_bond_interactions,
add_disulfide_interactions,
add_ionic_interactions,
add_aromatic_sulphur_interactions,
add_hydrophobic_interactions,
add_cation_pi_interactions]
}
config = ProteinGraphConfig(**new_edge_funcs)
# g = construct_graph(config=config, pdb_code="3eiy")
g = construct_graph(pdb_path=protein_path, config=config)```
Thanks @diamondspark I can reproduce the problem. Will check it out & hopefully fix it either this weekend or next week.
@diamondspark Thanks again for pointing out the problem. I've fixed this in v1.0.10
(pip install graphein==1.0.10
)
It turns out the problem was caused by the protein in your example not containing any cysteine residues (and therefore we cannot create a distance matrix of sulfurs used to compute disulfide bonds). I've added a check for a minimum of two CYS residues and it seems to work in my testing. Please do reopen this issue if it's not fixed for you and do let me know if yu have any other troubles :)
@a-r-j Is v1.0.10 released? I do pip install graphein==1.0.10, yet graphein.version returns 1.0.0
Hi @diamondspark Yep, you can see it here: https://pypi.org/project/graphein/
You may want to try uninstalling & reinstalling or using ‘—upgrade’
If you check the version reported by ‘pip list’ or similar, is it still 1.0.0?
Alternatively, which python version are you using? Earlier versions supported <3.8 but this is no longer the case and graphein requires 3.8+
Perhaps it's installed. pip list shows 1.0.10 however graphein.__version__
still shows 1.0.0. Maybe you haven't updated the code to reflect the version?
Yep, good spot. Thanks for pointing that out - will take care of this. If the output of pip list
shows 1.0.10
you should be good to go! Do let me know if the bug is otherwise fixed.
There still seems to be some issue for some proteins in the calculation of distmat. I think some check for Hydrogen Bonds is needed as well. Consider these pdb ids 1qaw
1swr
3ivc
2qbw
3e85
3wzn
2r9x
Example g = construct_graph(config=config, pdb_code='3wzn')
When I execute the above code an error occurs. The error is given below. How can I get rid of these type of errors?