FabianFuchsML / se3-transformer-public

code for the SE3 Transformers paper: https://arxiv.org/abs/2006.10503
475 stars 69 forks source link

Implementing custom SE(3) transformer #21

Closed zachary-mcdargh closed 3 years ago

zachary-mcdargh commented 3 years ago

Hello,

I am trying to implement a version of the SE(3) transformer that, given a cloud of N points, predicts 3N vectors, 3 associated with each point. The points themselves have no vector or scalar features for my purposes, so I have used the "dummy" features method in the toy model experiment. I attempted to modify the NBody experiment code in order to suit this problem, but have run into some difficulties. When I run the model forward, I receive the following error message: mat1 and mat2 shapes cannot be multiplied (48620x1 and 2x32) I believe this has something to do with my input layer. I know that 48620 is the number of edges in my data, but I'm not sure what the 2x32 matrix could be. I'm not sure how to fix this problem, because I'm not sure where the dimensions 2x32 are coming from. Any help resolving this issue, or even just tracing back the dimensions of this matrix, would be greatly appreciated.

FabianFuchsML commented 3 years ago

Hi Zach,

You made the right choice by using the n-body experiment as a base, it's definitely the one where you need to make the least adaptations to get what you want. It's quite heard to answer your question without knowing where the error is thrown. Could you find out (potentially with break points) where exactly that error arises? Is it the first time anything is done with your edges? I think then you can also trace back easily in the code what the '2' and the '32' are referring to. I am guessing the '1' is the edge dimension which might just be the distances between pairs of points?

zachary-mcdargh commented 3 years ago

Yes, the error is thrown the first time model(g) is called in the train function. As you said, the 1 is the edge dimension, and the only edge data is the pairwise distances.

Here is the traceback:

Traceback (most recent call last):
  File "backmap_run.py", line 224, in <module>
    main(FLAGS, UNPARSED_ARGV)
  File "backmap_run.py", line 209, in main
    train_epoch(epoch, model, task_loss, train_loader, optimizer, scheduler, FLAGS)
  File "backmap_run.py", line 81, in train_epoch
    pred = model(g)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/zacharymcdargh/Documents/se3-transformer-public/experiments/backmap/backmap_models.py", line 62, in forward
    h_enc = layer(h_enc, G=G, r=r, basis=basis)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/zacharymcdargh/Documents/se3-transformer-public/equivariant_attention/modules.py", line 796, in forward
    v = self.GMAB['v'](features, G=G, **kwargs)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/zacharymcdargh/Documents/se3-transformer-public/equivariant_attention/modules.py", line 648, in forward
    G.edata[etype] = self.kernel_unary[etype](feat, basis)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/zacharymcdargh/Documents/se3-transformer-public/equivariant_attention/modules.py", line 307, in forward
    R = self.rp(feat)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/zacharymcdargh/Documents/se3-transformer-public/equivariant_attention/modules.py", line 269, in forward
    y = self.net(x)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/Users/zacharymcdargh/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (48620x1 and 2x32)

I tried using break points, but wasn't easily able to determine where the '2' and '32' values come from. I tried fiddling with some of the parameters, like n_head, but only saw a change in the matrix dimensions when changing edge_dim parameter (which I assume should be 1) in the constructor of the SE3Transformer class. I suppose that I have somewhere told the program to expect two edge dimensions instead of one, but I can't quite parse where that would be.

FabianFuchsML commented 3 years ago

I'd advise checking how the variable edge_dim is further used, especially within equivariant_attention/modules.py. There is an ambiguity there because the distance r is added as an edge feature, but might not be counted as an edge_dim. So you might need to set edge_dim to 0 in the model config, but I am not 100% sure.

zachary-mcdargh commented 3 years ago

Ah I see. I've set edge_dim=0, and everything is running smoothly. Thanks for your help!