a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
994 stars 125 forks source link

convert_nx_to_pyg doubles edge_index but not kind #392

Open l-Dr-MR-l opened 2 months ago

l-Dr-MR-l commented 2 months ago

Describe the bug Using GraphFormatConvertor to go from nx to pyg graph results in a doubling of the number of edges indicated by edge_index (this means we are going from undirected nx edges to directed pyg edges in both directions), however not all edge features are similarly doubled. "kind" in particular, which indicates the bond type, is not doubled (there seems to be some filtering code specifically targetting this feature). One other feature I've tested, bond_length, does correctly double with the edge_index. I am wondering if this is intended? It seems to me that this may introduce faults in matching the correct feature to the correct edge since the doubled edges are interleaved in edge_index and not with the duplicate edges at the end of the matrix e.g.: [[0, 1, 1, 2], and not [[0, 1, 1, 2],
[1, 0, 2, 1]] [1, 2, 0, 1]] and then the kind feature tensor would be pointing to a different edge since it is assuming the same order of edges as pre-conversion? Though there may be some matching methods I'm unaware of to deal with this later?

To Reproduce I'm constructing a protein graph using the following ProteinGraphConfig:

self.graphein_config = ProteinGraphConfig(granularity='atom',
                                                  deprotonate=True,
                                                  edge_construction_functions=[
                                                      add_peptide_bonds,
                                                      add_atomic_edges,  # Covalent bonds
                                                      add_ring_status,
                                                      add_bond_order,
                                                      add_disulfide_interactions,
                                                  ],
                                                  node_metadata_functions=[amino_acid_one_hot,
                                                                           expasy_protein_scale,
                                                                           hydrogen_bond_acceptor,
                                                                           hydrogen_bond_donor,
                                                                           ],
                                                  )
        self.columns = ["edge_index", "chain_id", "residue_name", "residue_number", "atom_type",
                   "element_symbol", "coords", "kind", "bond_length"]

And then converting it using convertor = GraphFormatConvertor(src_format="nx", dst_format="pyg", verbose="all_info", columns=self.columns) And checking the numbers:

                    print(len(g.edges()))
                    converted_graph = convertor(g)
                    print(converted_graph)

Expected behavior Either all features double to match the edge_index or the edge_index (and other features) doesn't double and it is up to the user to add directed edges in the other direction (or some options for the converter to customise this). Alternatively leave the (not)doubling as is and instead add the reverse direction edges at the end of the edge_index tensor so we can simply apply the 'kind'-tensor to both halves.

Desktop (please complete the following information):