divelab / DIG

A library for graph deep learning research
https://diveintographs.readthedocs.io/
GNU General Public License v3.0
1.87k stars 283 forks source link

QM9 and SphereNet example error #77

Closed yuanqidu closed 2 years ago

yuanqidu commented 2 years ago

Great work!

When I copied the code from the README file and run it with the QM9 dataset provided by DIG, it showed me the following error when I attempt to create a SphereNet model.

Traceback (most recent call last): File "main_qm9.py", line 19, in model = SphereNet(energy_and_force=False, cutoff=5.0, num_layers=4, File "/opt/conda/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 265, in init self.emb = emb(num_spherical, num_radial, self.cutoff, envelope_exponent) File "/opt/conda/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 23, in init self.dist_emb = dist_emb(num_radial, cutoff, envelope_exponent) File "/opt/conda/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/features.py", line 178, in init self.reset_parameters() File "/opt/conda/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/features.py", line 181, in resetparameters torch.arange(1, self.freq.numel() + 1, out=self.freq).mul(PI) RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

limei0307 commented 2 years ago

Hi @yuanqidu,

Thanks for your interest in our work.

I didn't have this issue. Could you try to replace line 176 to 181

        self.freq = torch.nn.Parameter(torch.Tensor(num_radial))

        self.reset_parameters()

    def reset_parameters(self):
        torch.arange(1, self.freq.numel() + 1, out=self.freq).mul_(PI)

with

self.freq = torch.nn.Parameter(
            data=torch.tensor(
                np.pi * np.arange(1, num_radial + 1, dtype=np.float32)
            ),
            requires_grad=True,
        )

Does it solve your problem?

Thanks

zoexu119 commented 2 years ago

Hi @yuanqidu,

I also met this issue and believe it's due to the PyTorch version. You can also try to replace line 181 to be

self.freq.data = torch.arange(1, self.freq.numel() + 1).float().mul_(PI)
Takaogahara commented 2 years ago

Hello,

I had this issue and replacing the line 176 and 181 with the above suggestion worked for me. My torch and DIG versions:

torch==1.10.2+cu113
torch-geometric==2.0.3
dive-into-graphs==0.1.2
vinayak2019 commented 2 years ago

Hi @yuanqidu,

I also met this issue and believe it's due to the PyTorch version. You can also try to replace line 181 to be

self.freq.data = torch.arange(1, self.freq.numel() + 1).float().mul_(PI)

This did not work for me. Neither did the first solution. There is a dependency in spherenet.py line 29 which still causes an error

class emb(torch.nn.Module):
    def __init__(self, num_spherical, num_radial, cutoff, envelope_exponent):
        super(emb, self).__init__()
        self.dist_emb = dist_emb(num_radial, cutoff, envelope_exponent)
        self.angle_emb = angle_emb(num_spherical, num_radial, cutoff, envelope_exponent)
        self.torsion_emb = torsion_emb(num_spherical, num_radial, cutoff, envelope_exponent)
        self.reset_parameters()

    def reset_parameters(self):
        self.dist_emb.reset_parameters()
limei0307 commented 2 years ago

Hi @vinayak2019,

Could you please provide more detail about the error? Thanks. Besides, please install 'sympy' via 'pip install sympy'.

Thanks.

vinayak2019 commented 2 years ago

When I replace line 176 to 181 with the suggestion above I get the following error.

  File "sphere.py", line 20, in <module>
    model = SphereNet(energy_and_force=False, cutoff=5.0, num_layers=4,
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 266, in __init__
    self.emb = emb(num_spherical, num_radial, self.cutoff, envelope_exponent)
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 26, in __init__
    self.reset_parameters()
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 29, in reset_parameters
    self.dist_emb.reset_parameters()
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'dist_emb' object has no attribute 'reset_parameters'

In this case I delete the following lines in features.py

        self.reset_parameters()

    def reset_parameters(self):
        torch.arange(1, self.freq.numel() + 1, out=self.freq).mul_(PI)

The error is expected as reset_parameters is no longer defined.

When I add lines backs to the file, while still modifying the self.freq I get the following error.

Traceback (most recent call last):
  File "sphere.py", line 20, in <module>
    model = SphereNet(energy_and_force=False, cutoff=5.0, num_layers=4,
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 266, in __init__
    self.emb = emb(num_spherical, num_radial, self.cutoff, envelope_exponent)
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 23, in __init__
    self.dist_emb = dist_emb(num_radial, cutoff, envelope_exponent)
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/features.py", line 181, in __init__
    self.reset_parameters()
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/features.py", line 184, in reset_parameters
    torch.arange(1, self.freq.numel() + 1, out=self.freq).mul_(PI)
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

If I now set self.freq.data = torch.arange(1, self.freq.numel() + 1).float().mul_(PI). I still get the leaf Variable error. The only way I could get it working is by doing this

class dist_emb(torch.nn.Module):
    def __init__(self, num_radial, cutoff=5.0, envelope_exponent=5):
        super(dist_emb, self).__init__()
        self.cutoff = cutoff
        self.envelope = Envelope(envelope_exponent)
        self.freq = torch.nn.Parameter(
            data=torch.tensor(
                np.pi * np.arange(1, num_radial + 1, dtype=np.float32)
            ),
            requires_grad=True,
        )
        self.reset_parameters()

    def reset_parameters(self):
       # self.freq.data = torch.arange(1, self.freq.numel() + 1, out=self.freq).mul_(PI)
       pass

I don't know how correct that is.

I am using torch==1.11.0 dive-into-graphs (cloned from GitHub. The pip install dive-into-graphs has a bug line 53 mask is missing)

limei0307 commented 2 years ago

Hi @vinayak2019,

I think for the first solution, you can just remove the "self.dist_emb.reset_parameters()" in spherenet.py line 21 to 24 since the function doesn't need to reset_parameter.

For the second solution, could you provide the detailed error output?

Yes, you can just clone the code from GitHub since we updated the code after the latest pip install version (0.1.2).

Thanks.

vinayak2019 commented 2 years ago

Thanks, @limei0307

The second problem is the PyPI installed version. It is a different dataset that I have this problem, not the QM9. The error is the following.

Traceback (most recent call last):
  File "sphere.py", line 32, in <module>
    run3d.run(device, train_dataset, valid_dataset, test_dataset, model, loss_func, evaluation,
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/run.py", line 71, in run
    train_mae = self.train(model, optimizer, train_loader, energy_and_force, p, loss_func, device)
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/run.py", line 124, in train
    out = model(batch_data)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/method/spherenet/spherenet.py", line 293, in forward
    dist, angle, torsion, i, j, idx_kj, idx_ji = xyz_to_dat(pos, edge_index, num_nodes, use_torsion=True)
  File "/home/vbh226/.local/lib/python3.8/site-packages/dig/threedgraph/utils/geometric_computing.py", line 54, in xyz_to_dat
    idx_i_t = idx_i.repeat_interleave(num_triplets_t)
RuntimeError: repeats must have the same size as input along dim

I debugged the error to line 52-53

    repeat = num_triplets - 1
    num_triplets_t = num_triplets.repeat_interleave(repeat)

The package was installed with pip install dive-into-graphs When I looked up GitHub, I found the code was different.

    repeat = num_triplets
    num_triplets_t = num_triplets.repeat_interleave(repeat)[mask]

So I clone the repository and pip install . Now this works after the fix in features.py we discussed above.

limei0307 commented 2 years ago

Hi @vinayak2019, Ok. I will close this issue.