Project-MONAI / MONAI

AI Toolkit for Healthcare Imaging
https://monai.io/
Apache License 2.0
5.81k stars 1.07k forks source link

ShuffleBuffer not returning all patches #7986

Open CH4LLENG3R opened 3 months ago

CH4LLENG3R commented 3 months ago

Describe the bug While following the tutorial https://github.com/Project-MONAI/tutorials/blob/main/modules/2d_slices_from_3d_training.ipynb and implementing parts of it to my project especially when it comes to transforming Dataset containing 3D to 2D patches I encountered an issue with ShuffleBuffer.

`def create_dataset_2D_ds(ds, keys: list, trans2d: list) -> monai.data.ShuffleBuffer:

ds = CacheDataset(data=data_dicts, transform=transforms)

patch_func = monai.data.PatchIterd(
    keys=keys, patch_size=(None, None, 1), start_pos=(0, 0, 0)
)

patch_transform = Compose([
        SqueezeDimd(keys=keys, dim=-1)
        ] + trans2d)

patch_ds = monai.data.GridPatchDataset( # -> GridPatchDataset contains 531 entries
    data=ds, patch_iter=patch_func, transform=patch_transform, with_coordinates=False
)
sb = monai.data.ShuffleBuffer(patch_ds, buffer_size=200, seed=42, epochs=1) # -> ShuffleBuffer returning only 133 of them
return patch_ds`

The problem is described above in the comments, I know that the GridPatchDataset contains 531 entries, but using ShuffleBuffer in DataLoader will result in 133.

Expected behavior Being able to Iterate with ShuffleBuffer through entire GridPatchDataset

DylanHsu commented 2 months ago

I am having the same issue.

DylanHsu commented 2 months ago

@CH4LLENG3R I believe I discovered the source of the problem, or at least a workaround, in my setup. I was passing the ShuffleBuffer to a DataLoader with num_workers=4 set. Based on your ratio of expected vs. resulting images (about 25%), I am guessing you were also using 4 workers, and they are all fighting for shuffled slices (or something). Using num_workers=1 in the downstream DataLoader gives me the correct number of shuffled images.

CH4LLENG3R commented 2 months ago

Thank you @DylanHsu, your solution works!