a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.02k stars 129 forks source link

Add DSSP as node_metadata_functions returns error #239

Closed universvm closed 1 year ago

universvm commented 1 year ago

Describe the bug I'm unable to add dssp features to atoms

To Reproduce Simply using some parameters like these:

params = {
    "edge_construction_functions": [
        add_peptide_bonds,
    ],
    "node_metadata_functions": [
        meiler_embedding,
        add_dssp_feature
    ],
    "dssp_config": gp.DSSPConfig()
}

Which also appears to be the right way at least from: https://graphein.ai/notebooks/pscdb_baselines.html?highlight=add_dssp_feature#Transformation-from-Raw-Structure-to-ML-ready-Datasets-Construction-with-Graphein

My current workaround is to build the graph and then use

g2 = add_dssp_feature(g, feature="phi")

But I'd need to do these for all the fetures

Expected behavior Adding DSSP as node features.

Screenshots Error message when using add_dssp_feature

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [15], line 1
----> 1 g = construct_graph(config=config, pdb_path="test/test_data/1qys.pdb")

File ~/.conda/envs/graphtcr/lib/python3.8/site-packages/graphein/protein/graphs.py:728, in construct_graph(config, name, pdb_path, uniprot_id, pdb_code, chain_selection, model_index, df_processing_funcs, edge_construction_funcs, edge_annotation_funcs, node_annotation_funcs, graph_annotation_funcs)
    726 # Annotate additional node metadata
    727 if config.node_metadata_functions is not None:
--> 728     g = annotate_node_metadata(g, config.node_metadata_functions)
    729 progress.advance(task3)
    730 task4 = progress.add_task("Constructing edges...", total=1)

File ~/.conda/envs/graphtcr/lib/python3.8/site-packages/graphein/utils/utils.py:108, in annotate_node_metadata(G, funcs)
    106 for func in funcs:
    107     for n, d in G.nodes(data=True):
--> 108         func(n, d)
    109 return G

File ~/.conda/envs/graphtcr/lib/python3.8/site-packages/graphein/protein/features/nodes/dssp.py:85, in add_dssp_df(G, dssp_config)
     70 def add_dssp_df(
     71     G: nx.Graph,
     72     dssp_config: Optional[DSSPConfig],
     73 ) -> nx.Graph:
     74     """
     75     Construct DSSP dataframe and add as graph level variable to protein graph.
     76 
   (...)
     82     :rtype: nx.Graph
     83     """
---> 85     config = G.graph["config"]
     86     pdb_code = G.graph["pdb_code"]
     87     pdb_path = G.graph["pdb_path"]

AttributeError: 'str' object has no attribute 'graph'

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

a-r-j commented 1 year ago

Hi @universvm the add_dssp_function is really used internally. Could you try with some of the functions here? I expect this will resolve your issue.

universvm commented 1 year ago

That's what I mentioned in the post. I'm doing this after the creation of the graph:

g2 = add_dssp_feature(g, feature="phi")

It would be more ideal if it was taken as part of the parameters when constructing a graph. Also the documentation seems to use it too although it is commented out.

It's not a problem, just a bit slow.

a-r-j commented 1 year ago

Ah, I spotted this issue. The docs don’t help much; Node metadata functions are of the form func(node: str, node_data: Dict[str, Any]). IIRC the DSSP funcs should be used as a graph_metadata_function as we require access to the whole protein in order to compute them, though we add them as node features. This is admittedly super confusing (and I should do something about this). It’s probably smart for us to move over to using PyDSSP anyway.

universvm commented 1 year ago

Yeah makes sense! I think the PyDSSP may simplify things a bit, especially during installation

a-r-j commented 1 year ago

Will close for now. Feel free to ping me if this doesn’t work!