google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers
Apache License 2.0
710 stars 77 forks source link

AAN dataset crashing when loading .tsv file #53

Open exnx opened 1 year ago

exnx commented 1 year ago

Did anyone else have issues loading the AAN dataset into memory? In particular when I load the .tsv file into memory, it crashes :/ I used several different instances on Google Cloud, with varying amount of memory, up to 170G, 24 cpus, but it still crashed. I feel like I am missing something. Here's my snippet of code that crashes the instance every time.

from datasets import DatasetDict, Value, load_dataset
...

        dataset = load_dataset(
            "csv",
            data_files={
                "train": str(self.data_dir / "new_aan_pairs.train.tsv"),  # 8G file
                "val": str(self.data_dir / "new_aan_pairs.eval.tsv"),
                "test": str(self.data_dir / "new_aan_pairs.test.tsv"),
            },
            delimiter="\t",
            column_names=["label", "input1_id", "input2_id", "text1", "text2"],
            keep_in_memory=True,
jmycsu commented 1 year ago

Did anyone else have issues loading the AAN dataset into memory? In particular when I load the .tsv file into memory, it crashes :/ I used several different instances on Google Cloud, with varying amount of memory, up to 170G, 24 cpus, but it still crashed. I feel like I am missing something. Here's my snippet of code that crashes the instance every time.

from datasets import DatasetDict, Value, load_dataset
...

        dataset = load_dataset(
            "csv",
            data_files={
                "train": str(self.data_dir / "new_aan_pairs.train.tsv"),  # 8G file
                "val": str(self.data_dir / "new_aan_pairs.eval.tsv"),
                "test": str(self.data_dir / "new_aan_pairs.test.tsv"),
            },
            delimiter="\t",
            column_names=["label", "input1_id", "input2_id", "text1", "text2"],
            keep_in_memory=True,

@exnx Hello! Sorry to bother you. I got some problems when downloading the AAN dataset using the link [http://aan.how/download/]. Could you please tell me the right way to download the AAN dataset or share a link to it?

WonderSeven commented 12 months ago

Hi, there,

The provided download URL no longer works now, could anyone share the data, many thx!

sneerajmohan commented 10 months ago

Hi, there,

The provided download URL no longer works now, could anyone share the data, many thx!

Have you figured out any way to download it ?

WonderSeven commented 10 months ago

Hi, there, The provided download URL no longer works now, could anyone share the data, many thx!

Have you figured out any way to download it ?

No, I cannot find anywhere to download AAN dataset.