google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers
Apache License 2.0
710 stars 77 forks source link

Are you interested in publishing to huggingface/datasets ? #24

Open richarddwang opened 3 years ago

richarddwang commented 3 years ago

It's a little bit hard for Pytorch users to evaluate their models on the benchmark.

Are you willing to import your datasets to huggingface/datasets ? There are detailed steps about how to add a dataset. (https://huggingface.co/docs/datasets/add_dataset.html), and it shouldn't be hard since you can refer to the processing scripts of other datasets.

If this benchmark can be imported to huggingface/datasets, which then provides use for Numpy/Pandas/PyTorch/TensorFlow/JAX, I believe it will become more accessible and prevailed.

MostafaDehghani commented 3 years ago

Agree that it's a great idea, but we are a bit out of cycle for doing this. I added this to the list of TODOs, but it's a bit unlikely that we get to it any time soon.

alexmathfb commented 3 years ago

@richarddwang I may end up re-writing LRA for PyTorch. In that case, I'd be happy to port the datasets to huggingface.

Q. Which types of test cases do you think adequately tests the code?

For example. I envision a file that loops through the JAX dataloader and the PyTorch dataloader to check the output is identical.

vanzytay commented 3 years ago

@alexmathfb this sounds great.

richarddwang commented 3 years ago

@alexmathfb Sorry for the late reply That would be great!! BTW I recommend creating an issue or a draft pr on HF/datasets, ppl there are willing and able to provide precise support for you to port the datasets.