ORNL / HydraGNN

Distributed PyTorch implementation of multi-headed graph convolutional neural networks
BSD 3-Clause "New" or "Revised" License
68 stars 29 forks source link

MACE #287

Closed RylieWeaver closed 1 month ago

RylieWeaver commented 1 month ago

An implementation of the MACE Model in HYDRA.

There are 2 key aspects of the MACE Architecture: (1) Message passing and interaction blocks are equivariant to the O(3) (rotation/inversion) group and invariant to the T(3) (translation) group. (2) Predictions are made in an n-body expansion, where n is the number of layers. This is done by creating multi-body interactions, then decoding them. Layer 1 will decode 1-body interactions, layer 2 will decode w-body interactions, and so on. So, for a 3-layer model predicting energy, there are 3 outputs for energy, one at each layer, and they are summed at the end. This requires some adjustment to the behavior from Base.py. Specifically, there is a multihead decoder adapted from HYDRA after each convolutional layer.

RylieWeaver commented 1 month ago

@allaffa As you can see, the MACE repo had a large collection of utils files. I'd like your opinion on how to handle these. On one hand, leaving too much can create clutter. On the other hand, taking things out could reduce customization options / induce unforeseen errors later on.

I think there is a middle-ground, where I could take out ~50% of the utils, focusing on dataset/distributed utils. In terms of impact, I estimate that it would only make a difference to a curious MACE user that uses an unusual argument, since it's hard to account for all these when cleaning out a big repo. In terms of time, it would take a day or two so still be ready by Monday to rebase.

RylieWeaver commented 1 month ago

@allaffa Additionally, despite having torch in their name, I have place torch-ema and torchmetrics in requirements.txt. I did this because the index-url for requirements-torch.txt was incompatible with those packages. Is there a more preferable way to do this, or this is fine?

RylieWeaver commented 1 month ago

@allaffa

There is something important to note wrt blocks.py in MACE:

Many of these blocks need to be changed in order to adapt to HYDRA (usually shaping and variable passing). Importantly, many of these "blocks" are similar, with small tweaks. For example:

    RealAgnosticAttResidualInteractionBlock,
    RealAgnosticInteractionBlock,
    RealAgnosticResidualInteractionBlock,
    ResidualElementDependentInteractionBlock,

The differences between these blocks are a residual connection or not, and similarly small adjustments. Given the similarities and in order to make it more tractable, I chose to adjust and implement one of these blocks, which I discern to be generally applicable and effective. For example, I chose RealAgnosticAttResidualInteractionBlock out of those 4.

TLDR: This should be generally effective to choose one block and move forward, however is important to note down to memory in case an impassioned MACE user comes around with a very specific choice of architecture.

allaffa commented 1 month ago

@allaffa

There is something important to note wrt blocks.py in MACE:

Many of these blocks need to be changed in order to adapt to HYDRA (usually shaping and variable passing). Importantly, many of these "blocks" are similar, with small tweaks. For example:

    RealAgnosticAttResidualInteractionBlock,
    RealAgnosticInteractionBlock,
    RealAgnosticResidualInteractionBlock,
    ResidualElementDependentInteractionBlock,

The differences between these blocks are a residual connection or not, and similarly small adjustments. Given the similarities and in order to make it more tractable, I chose to adjust and implement one of these blocks, which I discern to be generally applicable and effective. For example, I chose RealAgnosticAttResidualInteractionBlock out of those 4.

TLDR: This should be generally effective to choose one block and move forward, however is important to note down to memory in case an impassioned MACE user comes around with a very specific choice of architecture.

@RylieWeaver I agree with your choice and thank you for letting me know.

RylieWeaver commented 1 month ago

@allaffa Here are the main updates on MACE:

Progress: (1) It is rebased to the recent utils PR (2) Almost all the low-hanging fruit to delete has been taken out

Next Steps: (1) Investigate unnecessary or broadly applicable functions. For example, scatter_add might be importable from torch_scatter, and cg.py might have things used in Nequip. (2) Fix data.x use. Although the tests are passing, the one-hot encoding I apply could cause errors in general cases. This will be changed, although it's worth noting it's highly recommended that node features like atom type are one-hot-encoded when input to MACE. I'll put this in a comment and RaiseWarning somewhere

Notes: (1) I've changed a lot of the test/json files just so the tests run faster, but I'll revert these before the final draft. (2) Re: It's highly recommended that node_features like atom type are one-hot encoded when input into MACE

RylieWeaver commented 1 month ago

@allaffa

Getting closer. :)

Yep :). Here’s a summary update for posterity, which I’ll also cover in the meeting:


  1. Replacing torch_scatter operations with an import worked fine. The original implementation avoided it for dependency reasons, but we’re good as long as we can install torch_scatter.

  2. I moved irreps_tools.py to the utils/model directory for broader applicability and created operations.py for a general function to get edge_vectors_and_lengths. We might revisit the naming or location.

  3. I couldn’t find anything similar in Nequip to generalize further. For example, grep searches for "wigner," "U_matrix," and "Contraction" returned nothing. Nequip's InteractionBlock is also different.

  4. In compile.py, I left all compilation-related functions to ensure speed and accuracy, even though some can be commented out without issue.

RylieWeaver commented 1 month ago

@allaffa At this point, I have made all changes which I can think of. So, I think the steps left are:

(1) Review by you, making any changes requested (2) Revert the changes I made to json and test_***.py files for quicker testing (3) Last review by you and Pei (4) Merge

RylieWeaver commented 1 month ago

@allaffa

There is an important update to note here which I've realized. It is that _MACE only natively supports atomic_numbers as nodeattributes. I am following up with the author, but I believe it is unfeasible or non-trivial or to change this.

At the current version that I have pushed now, it is a requirement that only raw atomic numbers are passed as node_attributes. I do the one-hot encoding automatically.

I also include tests in process_node_attributes function of MACEStack.py which will warn or throw an error if the data looks off.

RylieWeaver commented 1 month ago

Ready for next round of review

allaffa commented 1 month ago

@allaffa

Getting closer. :)

Yep :). Here’s a summary update for posterity, which I’ll also cover in the meeting:

  1. Replacing torch_scatter operations with an import worked fine. The original implementation avoided it for dependency reasons, but we’re good as long as we can install torch_scatter.
  2. I moved irreps_tools.py to the utils/model directory for broader applicability and created operations.py for a general function to get edge_vectors_and_lengths. We might revisit the naming or location.
  3. I couldn’t find anything similar in Nequip to generalize further. For example, grep searches for "wigner," "U_matrix," and "Contraction" returned nothing. Nequip's InteractionBlock is also different.
  4. In compile.py, I left all compilation-related functions to ensure speed and accuracy, even though some can be commented out without issue.

@RylieWeaver Since the functionalities incompile.py can be commented out without issue and are provided only for the sake of speed-up, I think it would be beneficial to add a short description about this at the beginning of the file.

RylieWeaver commented 1 month ago

@allaffa

Ready for next round. All tests should be running now.

allaffa commented 1 month ago

@RylieWeaver Conflicts left to resolve

RylieWeaver commented 1 month ago

@RylieWeaver Conflicts left to resolve

Oh, I think it involves the typo fix just now. The struggle of multiple PRs, lol

RylieWeaver commented 1 month ago

I left just a question about the torch seed set to 97. For the rest, it looks good to me.

Shall we have Pei take a look for a final check with some fresh eyes?