Adding the Edge features in GIN (GINE)

thegodone commented 1 year ago

Can you add the Edge features in GIN to get the so called GINE version based on this paper https://arxiv.org/pdf/1905.12265.pdf ? equation 2.1 page 3 and more exactly equation A.1 (without last Relu) page 16 A nice implementation is available from there: https://github.com/yuyangw/MolCLR/blob/3d3bc1912be27b0c97435fd9134f4d4c73d4c5ab/models/ginet_molclr.py

akensert commented 1 year ago

There are too many GNN variants out there.. :D

That said, I will try to implement a GIN-E variant (based on the paper you reference) in the near future. One difference is that I think I will keep the node_feature_update = (1 + eps) * prev_node_feature + node_feature_aggregated part of the GIN model which seems to be omitted in the GIN-E implementation you reference? Not sure if this is the right choice, thoughts?

After that I will look into AttentiveFP. I've seen it mentioned and studied in several papers now, so I will try to implement that one too in the near future.

As for DMPNN/DGIN conv layers that you requested previously: I found it difficult to design it the way we want it to be --- namely, in a way so that we can just replace e.g. MPNNConv with DMPNNConv in a keras sequential model). The main reason is that these models start with an "initial projection" (matrix transform) before the message passing steps, and ends with a "final projection" after the last message passing step. And for DGIN, node aggregation steps are performed subsequently. I guess additional arguments could be added to these layers (e.g. initialize_edge_state=True to the first layer, and perform_node_readout=True on the last layer), but I wasn't satisfied with this implementation/design. Instead, I'm planning to implemented a new layer: EdgeConv which can be used in this way:

(Note that EdgeConv is significantly different from the other gnn layers as it passes message based on edges instead of nodes. Also note that the node_feature and edge_feature field of the outputted graph tensor are unchanging, while an edge_state field is initialized and updated instead.)

DMPNN:

dmpnn_model = tf.keras.Sequential([
  EdgeConv(units=128),
  EdgeConv(units=128),
  NodeReadout(target='edge_state'),
  Readout()
])

DGIN:

dgin_model = tf.keras.Sequential([
  EdgeConv(units=128),
  EdgeConv(units=128),
  NodeReadout(target='edge_state'),
  GINConv(),                         # node aggregation steps
  GINConv(),                         # node aggregation steps
  Readout(),
])

This implementation will differ in some ways from the DMPNN and DGIN model implementations. The models uses the initial node states as residuals in the aggregation/propagation steps, while for the implementation above will use the very previous node states as residuals instead (both for the EdgeConv part as well as the node aggregaton part (GINConv). However, I think what is most important here is the EdgeConv.

Let me know what you think.

Finally, I'm also adding a feature to the MPNNConv (as well as the EdgeConv) layer to allow weight tying (see MPNNConv documentation later when it's implemented), which is used in both in the original MPNN implementaton as well as DMPNN and DGIN (as far as I remember/understand). In other words, the different aggregation steps of these models shares the same kernel/weight matrix.

thegodone commented 1 year ago

For GIN/E: it was shown that generaly epsilon is set to zero! but you can do clearly node_feature_update = (1 + eps) * prev_node_feature + node_feature_aggregated => and eps is set by default to zero. BTW: we see that again here we have "initial embedding layer for both edge and node features" (see my remark later)

EdgeConv is a good idea: are you able to do directed ?

FInally, in almost all recent graphs you have an initial embedding layer for both edge and node features to be sure we have the same hidden dimension for the rest of the graph operation like concat, add etc...

sharing the same matrix at each convolution may be an option yes, but this is less important. I like the edge_state idea.

You will see that in AttentiveFP there are two type of convolution over the nodes and other the molecule itself (the super node). This is used in several recent models.

Can you write a true example to use ESOL data on the current code you provide model.DMPNN or DGIN I have issue to work with it.

akensert commented 1 year ago

About DMPNN/DGIN/EdgeConv: I'm trying to figure out how to deal with subgraphs without edges (right now it raises an error). MolecularGraphEncoder(..., self_loops=True) resolves the issue, however, I would like it to work without self loops too. Perhaps do a conditional where only subgraphs with edges passes through the message_passing/updates; and the subgraphs without edges just skips it and gets assigned a zero tensor?

I've also found a bug unrelated to DMPNN/DGIN/EdgeConv. I'm trying to fix that first. In brief: in relatively rare cases, where the global disjoint graph has the same number of nodes and edges, some functionality such as graph_tensor.separate() will behave unexpectedly.

Let's see how much I can manage (in addition to fixing bugs). This is quite time consuming unfortunately :( EdgeConv is almost ready so I should be able to push that one soon.

akensert commented 1 year ago

@thegodone I've now added EdgeConv/NodeReadout implementations. I've also fixed some important bugs. Please update molgraph to the latest version.

akensert commented 1 year ago

@thegodone As far as I see, the edge features are not updated at each layer? They are used but not updated and passed to the next layer (asking just in case). I also have a question: Do you think GINConv should use edge features by default or not? Right now I've implemented GINConv to not use edge features by default (need to explicitly pass use_edge_features=True).

And related to the 'embeddings': they will not be part of the gin layer but this is rather something that can be performed prior to the gin layer, something like below; which should be similar, except that in the paper they have an embedding lookup for each feature type separately (e.g. one for atom type and one for atom hybridization, etc.). Not sure how important this really is, I actually think it is pretty much equivalent to having an embedding lookup for (e.g.) atom type and hybridization combined. Note that the tokenizer (of the chemistry module) produces tokens based on many feature types combined (something like "Sym:C;Hyb-SP2").

gin_model = tf.keras.Sequential([
   molgraph.layers.FeatureProjection(feature='node_feature', ...),
   molgraph.layers.FeatureProjection(feature='edge_feature', ...),
   # now gin layers
   molgraph.layers.GINConv(..., use_edge_features=True)
   molgraph.layers.GINConv(..., use_edge_features=True)
])

Or alternatively, if you use tokenized atom and bond features, FeatureProjection can be replaced by EmbeddingLookup

EDIT: Though as node features will be added to the edge features see here, edge features needs to be of same dimension. I'm thinking of applying a linear projection here to get them to the same dimension. So the user does not need to explicitly perform a FeatureProjection.

akensert / molgraph

Adding the Edge features in GIN (GINE) #5