Open kevinbird15 opened 2 years ago
If the other fix is right, this test would pass with test_eq(list(dl1), [torch.arange(i*12, i*12+12)%50])
@kevinbird15 is this fixed yet? I don't think so since the test is the same and not updated... What needs to be fixed here?
I think the question of whether there is anything that should be fixed. I am still unsure what the expected behavior is supposed to be since I am not using DistributedDL. So I think the first thing to answer is what the expected behavior of fastai is in the scenario outlined above. I think @marii-moe had the opinion that the current behavior is expected so there isn't really a bug. I think at least it would need to be compared to see if it actually improves performance to change this or if there are times where one behavior is desired over the other.
Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES
Describe the bug I believe DistributedDL is currently not behaving properly on non-divisible lengths
To Reproduce Steps to reproduce the behavior:
Expected behavior I would expect if we have a dataloader from 0-49 with batch size of 12, we would distribute that into 4 dataloaders.
dl0 = items 0-11 dl1 = items 12-23 dl2 = items 24-35 dl3 = items 36-47
Current Output
Proposed Output:
Additional context This came up when fixing an issue with Learner.get_preds where the dl.get_idxs was allowing items from outside the last batch to be displayed still even if drop_last=True which resulted in an index issue. After fixing that, this 20a_distributed test started failing and after digging in a bit, I believe the new behavior should be correct.