Closed craffel closed 5 years ago
Code LGTM. Did you test that both build_tfrecords.py and build_label_maps.py now generate the same results when the seed is fixed?
Do you happen to know if Python and numpy guarantee reliability when upgrading to new versions?
Looks like https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.RandomState.html has explicit guarantees which seem stronger than calling into random directly. I don't feel too strongly about this but could be worth switching to that...
Specifically, it's possible that a library that we import calls random unknowingly to us, or change when/whether they call random in a version upgrade. Using an explicit RandomState would relieve us of that (tiny) risk of non-determinism when upgrading dependencies.
Did you test that both build_tfrecords.py and build_label_maps.py now generate the same results when the seed is fixed?
Yes.
Do you happen to know if Python and numpy guarantee reliability when upgrading to new versions?
Not sure.
Specifically, it's possible that a library that we import calls random unknowingly to us, or change when/whether they call random in a version upgrade. Using an explicit RandomState would relieve us of that (tiny) risk of non-determinism when upgrading dependencies.
Ok, will make this change.
Ok, pushed a commit to switch to np.random.RandomState
. The semantics in build_label_maps.py
changed because of switching from random.sample
to np.random.choice
, PTAthoroughL.
Resolves #17 and #18. It was impossible to use the existing label maps because the datasets needed to be downloaded, which necessitated creating new label maps. Instead, we should just make the dataset and label map creation processes deterministic by manually setting seeds. This involved changing the README to include label map creation, so I also fixed #18 which specified ImageNet 32x32 creation in the wrong order.