extremely high accuracy in document retrieval task

mlpen commented 3 years ago

Hi,

I am testing the document retrieval task. I found that the zip file (https://storage.googleapis.com/long-range-arena/lra_release.gz) already contains the actual documents instead of just document ids. When I run the test with my own implementation of model in pytorch, the accuracy is over 70%.

vanzytay commented 3 years ago

Two things to take note of here.

1) ensure you're not using cross attention between documents. 2) ensure that you're not using subword or word level but character level.

Thanks

mlpen commented 3 years ago

Thanks for replying.

I am also use two tower style model token_out_0 = self.model(input_ids_0, mask_0) token_out_1 = self.model(input_ids_1, mask_1) seq_scores = self.seq_classifer(token_out_0, token_out_1) Within self.seq_classifer, the following is computed: X_0 = pooling(token_out_0, self.pooling_mode) X_1 = pooling(token_out_1, self.pooling_mode) seq_scores = self.mlpblock(torch.cat([X_0, X_1, X_0 * X_1, X_0 - X_1], dim = -1))
I use the input_pipeline.get_matching_datasets to generate data and tokenizer is set to "char" train_ds, eval_ds, test_ds, encoder = input_pipeline.get_matching_datasets( n_devices = 1, task_name = None, data_dir = "../../lra_release/lra_release/tsv_data/", batch_size = 1, fixed_vocab = None, max_length = 4000, tokenizer = "char", vocab_file_path = None)

adamsolomou commented 3 years ago

@mlpen How many training steps and warmup did you use? Config says to use 5K training steps and 8K warmup steps, but that feels weird.

vanzytay commented 3 years ago

That's because we used some default FLAX code and only did cursory sweep of hparams (hparam sweeps not within scope of the paper). Some other folks have found that training longer leads to better performance, hence I recommend works like https://arxiv.org/abs/2106.01540 and follow their setup. Thanks :)

google-research / long-range-arena

extremely high accuracy in document retrieval task #18