Describe the bug
HuggingFaceDataset doesn't shuffle the dataset either when passing shuffle=True and also by calling the shuffle() method
To Reproduce
Steps to reproduce the behavior:
Run this code:
normal_ds = HuggingFaceDataset("imdb", split="test")
print('Normal')
for i in range(10):
print(f'shuffle: {normal_ds.shuffled}, label: {normal_ds[i][1]}, Text: {normal_ds[i][0]}')
shuffle_ds = HuggingFaceDataset("imdb", split="test", shuffle=True)
print('shuffled')
for i in range(10):
print(f'shuffle: {shuffle_ds.shuffled}, label: {shuffle_ds[i][1]}, Text: {shuffle_ds[i][0]}')
normal_ds.shuffle()
print('Normal shuffled')
for i in range(10):
print(f'shuffle: {normal_ds.shuffled}, label: {normal_ds[i][1]}, Text: {normal_ds[i][0]}')
Describe the bug HuggingFaceDataset doesn't shuffle the dataset either when passing shuffle=True and also by calling the shuffle() method
To Reproduce Steps to reproduce the behavior:
shuffle_ds = HuggingFaceDataset("imdb", split="test", shuffle=True) print('shuffled') for i in range(10): print(f'shuffle: {shuffle_ds.shuffled}, label: {shuffle_ds[i][1]}, Text: {shuffle_ds[i][0]}')
normal_ds.shuffle() print('Normal shuffled') for i in range(10): print(f'shuffle: {normal_ds.shuffled}, label: {normal_ds[i][1]}, Text: {normal_ds[i][0]}')