a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.02k stars 129 forks source link

protein_df_processing_functions arg to ProteinGraphConfig does not process pdb dataframe #216

Closed alexanderbonnet closed 1 year ago

alexanderbonnet commented 1 year ago

Describe the bug Currently, passing a List of protein_df_processing_functions to the ProteinGraphConfig does not have any effect when building the protein graph.

To Reproduce

import pandas as pd
from graphein.protein.config import ProteinGraphConfig
from graphein.protein.graphs import construct_graph

def return_empty_df(df: pd.DataFrame) -> pd.DataFrame:
    return pd.DataFrame()

if __name__ == "__main__":
    params_to_change = {'protein_df_processing_functions': [return_empty_df]}

    config = ProteinGraphConfig(**params_to_change)
    config.dict()

    g1 = construct_graph(config=config, pdb_code="3eiy")
    g2 = construct_graph(pdb_code="3eiy")

    print(len(g1))

    assert len(g1) == len(g2)

Expected behavior The function should pre-process the pandas dataframe before building the protein graph.

Desktop (please complete the following information):

Additional context It looks like the code is missing in the construct_graph function here -> https://github.com/a-r-j/graphein/blob/28ae687af50cb69f423868c4f49dce968eacea5f/graphein/protein/graphs.py#L592 The protein_df_processing_functions List is assigned but never used.

Thanks for your work, and let me know if this is something you would like me to take on.

a-r-j commented 1 year ago

Nice catch! Thanks for the report @alexanderbonnet!

I think the API has drifted a little. Ideally we should separate the df_processing_funcs into atom_df_processing_funcs and hetatom_df_processing_funcs as the downstream function has entrypoints for both: https://github.com/a-r-j/graphein/blob/28ae687af50cb69f423868c4f49dce968eacea5f/graphein/protein/graphs.py#L238

A quick scan of the logic: https://github.com/a-r-j/graphein/blob/28ae687af50cb69f423868c4f49dce968eacea5f/graphein/protein/graphs.py#L299

Suggests it might not be so robust. Would be very grateful for a PR :)

a-r-j commented 1 year ago

Now resolved in 1.6.0

pip install graphein=1.6.0