google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers
Apache License 2.0
710 stars 77 forks source link

Serious bugs in the ListOps task #20

Closed cifkao closed 3 years ago

cifkao commented 3 years ago

I have discovered two serious bugs in the ListOps task, which unfortunately mean that the task is completely broken as far as I can tell.

  1. The input pipeline uses the tfds.deprecated.text.Tokenizer(), which ignores non-alphanumeric characters by default. This means an input like [MAX 4 3 [MIN 2 3 ] 1 0 [MEDIAN 1 5 8 9 2]] will actually get encoded as MAX 4 3 MIN 2 3 1 0 MEDIAN 1 5 8 9 2, making the task impossible to solve.
  2. The token counting in the data generation script is incorrect. With --max-length=2000, the generated sequences are clearly much longer than 2000 tokens (if encoded correctly), and will only get truncated to this length during training, leading again to an impossible task.
vanzytay commented 3 years ago

Hi,

Yes, we're already aware of this issue and have fixed it in our internal version awhile back. I have yet to mirror this to the external version and will get down to it sometime this week or next.

Thanks for the find. :)

Best, Yi

cifkao commented 3 years ago

Hi,

OK, it's great that you know about this, although it would have been helpful to put a warning somewhere (a "known issues" section in the readme?) so that people don't waste time and resources on it.

Also, I'm curious if the results in your paper are affected by this and whether you are planning to update it.

cifkao commented 3 years ago

Oh, I actually see other (closed) issues about this here on GitHub that I overlooked. Perhaps one of them should stay open until this is fixed here? People don't really check closed issues I think.