a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.03k stars 131 forks source link

update conversion #218

Closed manonreau closed 1 year ago

manonreau commented 2 years ago

Reference Issues/PRs

Fixes #217

What does this implement/fix? Explain your changes

The edge features are now given as a list of lists instead of a list of string during the networkx object to pyg object conversion

What testing did you do to verify the changes in this PR?

def graph2pkl(g, fname):
    """
    Save graphs as .pkl files

    Args:
        g (object): graph
    """

    # Graphein data to save
    d = ["config",
        "coords",
        "edge_index",
        "element_symbol",
        "kind",
        "node_id",
        "node_type",
        "residue_name",
        "residue_number"]

    # Convert networkx graph to pytorch geometric object
    format_convertor = GraphFormatConvertor('nx', 'pyg',
                                                verbose = None,
                                                columns = d)
    g = format_convertor(g)
    return g

g = graph2pkl(G, ('test'))
print(g)
g.kind

Pull Request Checklist

sonarcloud[bot] commented 2 years ago

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell B 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

a-r-j commented 2 years ago

Thanks for the PR @manonreau!! I'll check this out tomorrow.

Do you think you'd be able to add an appropriate unit test?

a-r-j commented 1 year ago

Hi @manonreau could you provide the code for g = format_node_edge_features(g) so I can write a test & get this merged in? Thanks!!

a-r-j commented 1 year ago

Changes added in #220

manonreau commented 1 year ago

Hi @manonreau could you provide the code for g = format_node_edge_features(g) so I can write a test & get this merged in? Thanks!!

Hi @a-r-j, Thank you very much for considering my PRs. I just removed the g = format_node_edge_features(g) since it was just a function to add node level descriptors. I does not change anything to the structure of the graph object.

You should be able to write a test now.

a-r-j commented 1 year ago

@manonreau I see. Would you be willing to share it anyway? It could be useful :)

And thanks for the contributions!!

manonreau commented 1 year ago

Sure, here it is:

def onehot(idx, size):
    """One hot encoder
    """
    onehot = torch.zeros(size)
    # Fill the one-hot encoded sequence with 1 at the corresponding idx
    onehot[idx] = 1
    return np.array(onehot)

def format_node_edge_features(g):
    """Format the nodes and edges features computed with Graphein

    Args:
        g (object): graph

    Returns:
        object: updated graph
    """

    # one hot encoding
    residue_names = {'CYS': 0, 'HIS': 1, 'ASN': 2, 'GLN': 3, 'SER': 4, 'THR': 5, 'TYR': 6, 'TRP': 7,
                     'ALA': 8, 'PHE': 9, 'GLY': 10, 'ILE': 11, 'VAL': 12, 'MET': 13, 'PRO': 14, 'LEU': 15,
                     'GLU': 16, 'ASP': 17, 'LYS': 18, 'ARG': 19}

    edge_type_encoding = {
        'peptide_bond': 0, 'aromatic': 1, 'disulfide': 2, 'ionic': 3, 
        'aromatic_sulphur': 4, 'cation_pi' : 5, 'distance_threshold' : 6, 'hbond' : 7}

    # convert node information
    resname_onehot = []    
    for res in g.residue_name :
        # One hot encoding of the residue name
        resname_onehot.append(onehot(residue_names[res], len (residue_names)))

    g["residue"] = resname_onehot

    edge_onehot = []
    for res in g.kind :
        # One hot encoding of the edge type
        edge_onehot.append(onehot([edge_type_encoding[x] for x in res], len (edge_type_encoding)))

    g["edge_attr"] = edge_onehot

    return g

I later noticed that the onehot encoding is already provided by Graphein :)