Open aukejw opened 8 years ago
Interesting. Though this seems rather a bug in the H5PYDataset and not in our sampling scheme. Does it only occur if two indexes (the same datapoint) by accident is multiple times in the same request (i.e. batch)?
It only seems to occur when there are duplicates in the request.
Maybe we should create an issue in mila/fuel? For me it seems general enough that it is not dependent on our Sampling scheme. Any random sampling scheme that allows sampling with replacement will have the same problem. Of cause it is quite unlikely for big datasets, but the smaller it is the bigger the problem gets...
Apparently,
h5py
files do not support indexing in the formindexable[np.array([0, 0, 1, 1]), ...]
because of the duplicates:We'll need to find a workaround, warn the user that
load_in_memory
must be True, or both.