dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.5k stars 3.02k forks source link

DGL DataLoader does not maintain example order with shuffle=False when using multiple workers. #7695

Open mr-mateusz opened 2 months ago

mr-mateusz commented 2 months ago

🐛 Bug

DGL DataLoader does not maintain order of examples with shuffle=False when num_workers > 1 and batch_size * num_workers <= dataset_size. Elements in a single batch are in order but they are not in order across batches. This behavior seems inconsistent with the expected operation of a DataLoader when shuffle=False.

To Reproduce

Code:

import dgl
import torch

num_layers = 4

# Example dataset
random_embeddings = torch.randn(10000, 128)
target_values = torch.rand(10000)

# Indices of test nodes
test_start_index = 1000
test_size = 4096
test_mask = torch.zeros(10000, dtype=torch.bool)
test_mask[test_start_index:test_start_index + test_size] = True

# Create graph
dgl_graph = dgl.knn_graph(random_embeddings, 10, exclude_self=True)

dgl_graph.ndata['features'] = random_embeddings
dgl_graph.ndata['target'] = target_values

# Indices of 'test' elements
_nids = torch.where(test_mask)[0]

print("Number of rows:", dgl_graph.ndata['target'][test_mask].shape)

# Example 1
_sampler = dgl.dataloading.MultiLayerFullNeighborSampler(num_layers)
_loader = dgl.dataloading.DataLoader(dgl_graph, _nids, _sampler, batch_size=1024, shuffle=False, drop_last=False,
                                     num_workers=4)

print('Example 1. 4 workers, batch size 1024. -> 4 * 1024 = 4096 (equal to test_size)')

_targets_iterated = []
for in_nodes, out_nodes, blocks in _loader:
    print(out_nodes[:5], out_nodes[-5:])
    _targets_iterated.append(blocks[-1].dstdata['target'])

_targets_iterated = torch.cat(_targets_iterated)

print(torch.equal(dgl_graph.ndata['target'][test_mask], _targets_iterated))

print('---')

# Example 2

print('Example 2. 4 workers, batch size 512. -> 4 * 512 = 2048 (less than test_size)')

_sampler = dgl.dataloading.MultiLayerFullNeighborSampler(num_layers)
_loader = dgl.dataloading.DataLoader(dgl_graph, _nids, _sampler, batch_size=512, shuffle=False, drop_last=False,
                                     num_workers=4)

_targets_iterated = []
for in_nodes, out_nodes, blocks in _loader:
    print(out_nodes[:5], out_nodes[-5:])
    _targets_iterated.append(blocks[-1].dstdata['target'])

_targets_iterated = torch.cat(_targets_iterated)

print(torch.equal(dgl_graph.ndata['target'][test_mask], _targets_iterated))

print('---')

# Example 3

print('Example 3. 1 worker, batch size 512')

_sampler = dgl.dataloading.MultiLayerFullNeighborSampler(num_layers)
_loader = dgl.dataloading.DataLoader(dgl_graph, _nids, _sampler, batch_size=512, shuffle=False, drop_last=False,
                                     num_workers=1)

_targets_iterated = []
for in_nodes, out_nodes, blocks in _loader:
    print(out_nodes[:5], out_nodes[-5:])
    _targets_iterated.append(blocks[-1].dstdata['target'])

_targets_iterated = torch.cat(_targets_iterated)

print(torch.equal(dgl_graph.ndata['target'][test_mask], _targets_iterated))

# Torch Dataloader

print('---')
print('pytorch')

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

dataset = MyDataset(torch.where(test_mask)[0])

dataloader = torch.utils.data.DataLoader(dataset, batch_size=512, shuffle=False, num_workers=4)

for batch in dataloader:
    print(batch[:5], batch[-5:])

Output:

Number of rows: torch.Size([4096])
Example 1. 4 workers, batch size 1024. -> 4 * 1024 = 4096 (equal to test_size)
tensor([1000, 1001, 1002, 1003, 1004]) tensor([2019, 2020, 2021, 2022, 2023])
tensor([2024, 2025, 2026, 2027, 2028]) tensor([3043, 3044, 3045, 3046, 3047])
tensor([3048, 3049, 3050, 3051, 3052]) tensor([4067, 4068, 4069, 4070, 4071])
tensor([4072, 4073, 4074, 4075, 4076]) tensor([5091, 5092, 5093, 5094, 5095])
True
---
Example 1. 4 workers, batch size 512. -> 4 * 512 = 2048 (less than test_size)    <= here elements are not in order
tensor([1000, 1001, 1002, 1003, 1004]) tensor([1507, 1508, 1509, 1510, 1511])
tensor([2024, 2025, 2026, 2027, 2028]) tensor([2531, 2532, 2533, 2534, 2535])
tensor([3048, 3049, 3050, 3051, 3052]) tensor([3555, 3556, 3557, 3558, 3559])
tensor([4072, 4073, 4074, 4075, 4076]) tensor([4579, 4580, 4581, 4582, 4583])
tensor([1512, 1513, 1514, 1515, 1516]) tensor([2019, 2020, 2021, 2022, 2023])
tensor([2536, 2537, 2538, 2539, 2540]) tensor([3043, 3044, 3045, 3046, 3047])
tensor([3560, 3561, 3562, 3563, 3564]) tensor([4067, 4068, 4069, 4070, 4071])
tensor([4584, 4585, 4586, 4587, 4588]) tensor([5091, 5092, 5093, 5094, 5095])
False
---
Example 1. 1 worker, batch size 512
tensor([1000, 1001, 1002, 1003, 1004]) tensor([1507, 1508, 1509, 1510, 1511])
tensor([1512, 1513, 1514, 1515, 1516]) tensor([2019, 2020, 2021, 2022, 2023])
tensor([2024, 2025, 2026, 2027, 2028]) tensor([2531, 2532, 2533, 2534, 2535])
tensor([2536, 2537, 2538, 2539, 2540]) tensor([3043, 3044, 3045, 3046, 3047])
tensor([3048, 3049, 3050, 3051, 3052]) tensor([3555, 3556, 3557, 3558, 3559])
tensor([3560, 3561, 3562, 3563, 3564]) tensor([4067, 4068, 4069, 4070, 4071])
tensor([4072, 4073, 4074, 4075, 4076]) tensor([4579, 4580, 4581, 4582, 4583])
tensor([4584, 4585, 4586, 4587, 4588]) tensor([5091, 5092, 5093, 5094, 5095])
True
---
pytorch
tensor([1000, 1001, 1002, 1003, 1004]) tensor([1507, 1508, 1509, 1510, 1511])
tensor([1512, 1513, 1514, 1515, 1516]) tensor([2019, 2020, 2021, 2022, 2023])
tensor([2024, 2025, 2026, 2027, 2028]) tensor([2531, 2532, 2533, 2534, 2535])
tensor([2536, 2537, 2538, 2539, 2540]) tensor([3043, 3044, 3045, 3046, 3047])
tensor([3048, 3049, 3050, 3051, 3052]) tensor([3555, 3556, 3557, 3558, 3559])
tensor([3560, 3561, 3562, 3563, 3564]) tensor([4067, 4068, 4069, 4070, 4071])
tensor([4072, 4073, 4074, 4075, 4076]) tensor([4579, 4580, 4581, 4582, 4583])
tensor([4584, 4585, 4586, 4587, 4588]) tensor([5091, 5092, 5093, 5094, 5095])

Expected behavior

DataLoader to produces data in the same order as the input indices when shuffle=False, regardless of the number of workers or batch size.

Environment

Additional context

frozenbugs commented 2 months ago

Hi @mr-mateusz , can you try graphbolt https://docs.dgl.ai/stochastic_training/index.html, which is our latest SOTA GNN dataloader. DGL dataloader is in a unmaintained mode now and will be deprecated in the future.

If you observe same issue, please reachout.