Sampling behavior for Imagenet not reproducible

google-research / meta-dataset

A dataset of datasets for learning to learn from few examples

Apache License 2.0

761 stars 139 forks source link

def set_seed(seed): random.seed(seed) os.environ['PYTHONHASHSEED'] = str(seed) np.random.seed(seed) tf.compat.v1.random.set_random_seed(seed) sampling.RNG = np.random.RandomState(seed)

I have noticed that generating the dataset_spec.json was not deterministic in the order of keys in dictionaries, but did not make a case of it, but it may be linked. I think children of a node in the DAG are represented as sets or dicts, and depending on the version of Python, they might or might not be iterated in the same order between executions.

The first thing would be to make sure you are using the same dataset_spec.json (if you are working with multiple machines), and then look for iteration over children of a node, replacing iterations over the un-ordered collection by sorting first (over the synset "n0...." string should be fine). Then if performance needs it, I can have a look at replacing dictionaries with OrderedDict or sorting only at strategic places.

google-research / meta-dataset

Sampling behavior for Imagenet not reproducible #50