Closed frankaging closed 2 years ago
Many thanks for your feedback. Actually, r2d2.py is used in the first paper accepted by ACL 2021, and r2d2_cuda.py is used in the paper accepted by EMNLP. As r2d2.py is not used in our latest work, so we didn't do a regression test which may cause some discrepancies. If you are interested in the first work, I've made a tag for it: https://github.com/alipay/StructuredLM_RTDT/tree/r2d2, which is the original code for the first paper. The current branch actually only supports the cuda version(r2d2_cuda.py). Since r2d2.py actually is legacy code, we'll consider fixing the discrepancy or removing it directly. But the numbers in the paper are running in the correct version. If you have a Cuda environment, I suggest you use the latest version(Fast-R2D2), which is almost 30 folds faster than R2D2, with better downstream tasks performance.
Thanks for your quick response! Closing the issue as this is not found in the r2d2 repo.
I've checked in a bug-fixed version of r2d2 in the latest branch. We will release a model pretrained on wiki103 of fast-r2d2 soon, hope that will be helpful to your work :)
Hi Authors!
Thanks for sharing this repo, I enjoyed when reading your paper, and I am working on a related project. As I am going through the code, I found one potential issue with the current setup. I will (1) explain the issue, and (2) provide a simple test case that I ran on my end. Please help with verifying.
Issue:
BinaryEncoder
module is set withbatch_first=True
, however it seems like we are passing in Q, K, V matrics without the first dimension being the batch dimension.Code Analysis: In
r2d2.py
, it is calling the encoder here, as the followingWe can see that
input_embedding
is definitely with the first dimension being thebatch_size
as it concat with the embeddings from thenn.embedding
module. Before we callself.tree_encoder
,.transpose(0, 1)
makes the the second dimension of the input being thebatch_size
instead. Specifically, the first dimension, in this case, is always 4.Testing Done: I simply add some logs inside
TreeEncoderLayer
as,And this is what I get,
Summary: It seems like for
r2d2.py
, the self-attention is not on those 4 tokens (2 special prefix + left and right children embedding), but it is on the full collection of candidates with their children.I saw this issue is not presented in
r2d2_cuda.py
as,This is great. I have not checked the rest of the code for
r2d2_cuda.py
though. With this, I am wondering are the numbers from either of your papers need to be updated with this potential issue? Either way, I am not blocked by this potential issue, and I was inspired quite a lot by your codebase. Thanks!