FabianFuchsML / se3-transformer-public

code for the SE3 Transformers paper: https://arxiv.org/abs/2006.10503
475 stars 69 forks source link

ScanObjectNN input question #13

Closed milesial closed 3 years ago

milesial commented 3 years ago

Hello and thank you for your work,

I am trying to reproduce your results on the ScanObjectNN dataset, but I am confused about how you described the inputs in your paper.

You state:

The input to the Tensorfield network and the Se(3) Transformer are relative x-y-z positions of eachpoint w.r.t. their neighbours. To guarantee equivariance, these inputs are provided as fields of degree 1.For the ‘+z‘ versions, however, we deliberately break the SE(3) equivariance by providing additional and relative z-position as two additional scalar fields (i.e. degree 0), as well as relative x-y positionsas a degree 1 field (where the z-component is set to 0).

Let's say we have a KNN graph with k neighbors per node. Some questions:

Thanks.

FabianFuchsML commented 3 years ago

Hi Milesial,

Did you mean "providing absolute and relative z-position as additional scalar fields"? Yes, that's a typo, it should say absolute and relative.

Regarding the rest of your questions: relative positions are fed in as edge features, whereas absolute positions are fed in as node features. Type-1 edge features are a bit fiddly, because you can't feed them into the radial networks. The cleanest way in my opinion is to concatenate them to the feature vectors. E.g., if you have a node i with a feature vector f_i, and that feature vector will be transformed into a key k_ij = W_ij f_i, then the relative position (type-1 edge feature) would need to be added right at this point so that f_i becomes f_ij and k_ij = W_ij f_ij. The same needs to be done for queries and values.

I hope this helps and happy Easter!

milesial commented 3 years ago

Thank you. Type-1 edge features were indeed my concern. So far by treating these relative positions as node features (with k channels) I only got an accuracy of 75%, and I will try integrating type-1 edge features to see if your 85% can be reproduced.

You said this also has to be done for queries, but this is not the case right? Because queries are node-specific and from a linear layer. For keys and queries I understand, as they are the output of a partial convolution.

Happy easter :)

FabianFuchsML commented 3 years ago

The closing was by accident.

You said this also has to be done for queries, but this is not the case right? Because queries are node-specific and from a linear layer.

Yes, you are correct, relative positions are not concatenated to the query vectors. We did feed the absolute z position of the node into self-interaction after the attention. In hindsight, this is a bit of a weird/arbitrary concept, but I would recommend to provide the absolute z position per node in some way.

To guarantee equivariance, these inputs are provided as fields of degree 1.For the ‘+z‘ versions, however, we deliberately break the SE(3) equivariance by providing additional and relative z-position as two additional scalar fields (i.e. degree 0), as well as relative x-y positionsas a degree 1 field (where the z-component is set to 0).

To be precise, even though I am not sure it makes a difference: what I meant when writing this is "2 relative x-y positions as degree 1 fields, where the z-component in one of them is set to 0"

Also, bear in mind that the code in this repository is a re-implementation, not the original code. It is close to the code we used for the qm9 experiments in the paper but actually quite different to the ScanObjectNN code in the paper (IP reasons). What I would recommend is to incorporate in the important concepts in a way that does not make the code too messy. I assume that at some level of detail, the choices you make have an impact on optimal hyper-parameter settings but not on performance in general.

milesial commented 3 years ago

Thank you very much. I was able to reproduce 84% accuracy with:

With type-x edge features being concatenated to the node features before applying the convolution kernels. Also in the h5 files of ScanObjectNN, the axis seem to be XZY or YZX and not XYZ, so for anyone trying to reproduce this, be careful of that.

I'm closing this for now, will reopen later if I run into additional issues.