choderalab / espaloma

Extensible Surrogate Potential of Ab initio Learned and Optimized by Message-passing Algorithm 🍹https://arxiv.org/abs/2010.01196
https://docs.espaloma.org/en/latest/
MIT License
202 stars 23 forks source link

Issues trying to reproduce atom typing recovery experiment #202

Open rohithmohan opened 6 months ago

rohithmohan commented 6 months ago

I'm trying to reproduce the atom typing recovery experiment from the docs and ran into some issues. I'm including the steps I've tried below but I had a couple general questions:

Steps I've tried so far

First, in order to set up the environment I used mamba create -n espaloma-032 -c conda-forge espaloma=0.3.2 as suggested in https://github.com/choderalab/espaloma/issues/195#issuecomment-1776752844

The URL for the zinc dataset was not working so I replaced that chunk of code with the suggestion in https://github.com/choderalab/espaloma/issues/120

Following along with the code after that, I ran into the following warnings and error:

/opt/mambaforge/envs/espaloma-032/lib/python3.11/site-packages/h5py/__init__.py:36: UserWarning: h5py is running against HDF5 1.14.3 when it was built against 1.14.2, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "

/opt/mambaforge/envs/espaloma-032/lib/python3.11/site-packages/dgl/heterograph.py:92: DGLWarning: Recommend creating graphs by `dgl.graph(data)` instead of `dgl.DGLGraph(data)`.
  dgl_warning(
[02:28:17] Explicit valence for atom # 9 N, 5, is greater than permitted
[02:28:17] ERROR: Could not sanitize molecule ending on line 142174
[02:28:17] ERROR: Explicit valence for atom # 9 N, 5, is greater than permitted
<few more warnings like above not included here>

AttributeError                            Traceback (most recent call last)
Cell In[8], line 8
      6 for g in ds_tr:
      7     optimizer.zero_grad()
----> 8     net(g.heterograph)
      9     loss = loss_fn(g.heterograph)
     10     loss.backward()

AttributeError: 'DGLGraph' object has no attribute 'heterograph'

At this point I tried referring to the docs for some of the other experiments and modified the following chunks of code:

if torch.cuda.is_available():
    net = net.cuda()
----------------------------
for idx_epoch in range(3000):
    train_iterator = tqdm(ds_tr, desc=f'Epoch {idx_epoch+1}/{3000}', unit='batch')
    for g in train_iterator:
        optimizer.zero_grad()
        if torch.cuda.is_available():
            g = g.to("cuda:0")
        g=net(g)
        loss = loss_fn(g)
        loss.requires_grad = True
        loss.backward()
        optimizer.step()

        train_iterator.set_postfix(loss=loss.item())
    loss_tr.append(loss.item())

With this I was able to get the model to train but the training loss looks off so I'm probably doing something wrong. Anyone have any ideas/suggestions? training_loss_plot

rohithmohan commented 6 months ago

Figured out the issue, should've realized that I wouldn't need to specify loss.requires_grad = True

In case others might run into the same issue, the problem was resolved by using loss_fn = esp.metrics.TypingCrossEntropy() instead of loss_fn = esp.metrics.TypingAccuracy() as the docs suggested. Got much better training/val loss curves after that. There were some other deviations from the docs but that was the main one throwing me off.

mikemhenry commented 5 months ago

Glad you figured this out! Is there anything we can do to make this more clear in our documentation?

rohithmohan commented 5 months ago

Thanks for following up! It might be helpful to update the atom typing recovery docs with some of the changes I mentioned.

Specifically, changing loss_fn = esp.metrics.TypingAccuracy() to loss_fn = esp.metrics.TypingCrossEntropy()

And modifying the last code block on the page to something like:

# define optimizer
optimizer = torch.optim.Adam(net.parameters(), 1e-5)

# Uncomment below to use the GPU for training
# if torch.cuda.is_available():
#     net = net.cuda()

# train the model
for _ in range(3000):
    for g in ds_tr:
        optimizer.zero_grad()
        # Uncomment below to use the GPU for training
        # if torch.cuda.is_available():
        #     g = g.to("cuda:0")
        g=net(g)
        loss = loss_fn(g)
        loss.requires_grad = True
        loss.backward()
        optimizer.step()

Happy to submit a Pull Request if it's appropriate!