flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.85k stars 2.09k forks source link

[Feature]: add random seed to dataset functions for reproducibility #3474

Closed MattGPT-ai closed 3 months ago

MattGPT-ai commented 3 months ago

Problem statement

When performing random splitting and downsampling of datasets, including when loading Flair datasets, there is no option to set the random seed specifically for that function. You can set random seed outside of the function, but this does not seem to provide as full control and reproducibility if you want to set specific different random seeds for each operation. I have had trouble controlling this when instantiating datasets that can be imported from Flair.

Solution

Ideally there would be an option to pass a random seed to functions like randomly_split_into_two_datasets and downsample. Perhaps you could pass in a random seed to dataset objects like flair.datasets.sequence_labeling.CONLL_03

Additional Context

No response