a-antoniades / Neuroformer

MIT License
30 stars 3 forks source link

Possible error on test set creation #4

Closed SalvoCalcagno closed 1 month ago

SalvoCalcagno commented 2 months ago

In neuroformer/datasets.py the line test_intervals = intervals[~chosen_idx] does not create a complementary set wrt the train one.

The tilde operator applied on integers, inverts the bits representation. (see also https://stackoverflow.com/questions/8305199/the-tilde-operator-in-python).

Moreover, the random.choice does repeat some indices.

If this is not the intended use, I suggest to switch to the following:

# create indices
chosen_idx = np.arange(0, len(intervals))
# shuffle the indices
np.random.shuffle(chosen_idx)
chosen_idx = chosen_idx[:int(len(intervals) * 0.8)]
chosen_idx_mask = np.zeros(len(intervals), dtype=bool)
chosen_idx_mask[chosen_idx] = True
# select train and test intervals
train_intervals = intervals[chosen_idx_mask]
# take the rest as test intervals
test_intervals = intervals[~chosen_idx_mask]
a-antoniades commented 1 month ago

Thanks for bringing this to my attention, I have pushed a change to resolve this issue.