facebookresearch / ELI5

Scripts and links to recreate the ELI5 dataset.
Other
316 stars 40 forks source link

Data generation fixes and information on necessary parameters for training and generation #13

Closed arueckle closed 4 years ago

arueckle commented 4 years ago

Patch description During data creation and model training (Q-to-A) I've come across some obstacles (which I also described in #10). Many of them have been fixed meanwhile, and the rest is addressed in this pull request:

Testing steps For the readme try to run the old commands without adaptation. For instance, generation without setting --max-source-positions 4096 --max-target-positions 4096 will skip almost all examples (see log below).

Logs

| WARNING: 9893 samples have invalid sizes and will be skipped, max_positions=(1024, 1024), first few sample ids=[9052, 6593, 4710, 9081, 8042, 5242, 890, 7521, 7079, 3455]

Other information