dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.53k stars 3.02k forks source link

[GraphBolt] Check return values of liburing calls and refactor. #7518

Closed mfbalin closed 3 months ago

mfbalin commented 3 months ago

Description

In the old code, we were not checking the return value of the results. If the reads are short (requested N bytes but it reads M < N bytes), then we have to continue from where it left off and submit another request. Now, we can write a regression test and measure speed and after speed is measured, check correctness without affecting timing.

Regression test:

import os
import tempfile
from functools import partial
import time

import torch
import dgl.graphbolt as gb

assert_equal = partial(torch.testing.assert_close, rtol=0, atol=0)

def to_on_disk_numpy(test_dir, name, t):
    path = os.path.join(test_dir, name + ".npy")
    gb.numpy_save_aligned(path, t.numpy())
    return path

def test_index_select_throughput_and_iops(shape, dtype, indices, num_threads_list):
    tensor = torch.randint(0, 127, shape, dtype=dtype)

    skip_first = 10

    results = []
    IOPSs = []

    with tempfile.TemporaryDirectory() as test_dir:
        path = to_on_disk_numpy(test_dir, "tensor", tensor)

        for num_threads in num_threads_list:
            feature = gb.DiskBasedFeature(path=path, num_threads=num_threads)

            throughput_sum = 0
            iops_sum = 0

            for i, idx in enumerate(indices):
                start = time.time()
                result = feature.read(idx)
                duration = time.time() - start
                assert_equal(result, tensor[idx])

                if i >= skip_first:
                    throughput_sum += result.nbytes / duration
                    iops_sum += idx.numel() / duration

            throughput = throughput_sum / (len(indices) - skip_first)
            iops = iops_sum / (len(indices) - skip_first)
            print(num_threads, int(throughput / (2 ** 20)), "MiB/s", int(iops), "IOPS")
            results.append(throughput)
            IOPSs.append(iops)

    return results, IOPSs

shape = [2500000, 4096]
dtype = torch.int8
batch_size = 100000
indices = [torch.randint(0, shape[0], [batch_size], dtype=torch.int32) for _ in range(25)]
num_threads_list = list(range(1, 9))

throughputs, IOPSs = test_index_select_throughput_and_iops(shape, dtype, indices, num_threads_list)
print(list((num_threads, int(throughput / (2 ** 20)), int(iops)) for num_threads, throughput, iops in zip(num_threads_list, throughputs, IOPSs)))

Benchmark results with 4K byte feature dimension:

(venv) mfbalin@BALIN-PC:~/dgl-1$ python graphbolt/benchmarks/disk_based_feature.py
1 715 MiB/s 183271 IOPS
2 1142 MiB/s 292576 IOPS
3 1571 MiB/s 402287 IOPS
4 1785 MiB/s 457213 IOPS
5 1873 MiB/s 479614 IOPS
6 2094 MiB/s 536154 IOPS
7 2185 MiB/s 559578 IOPS
8 2228 MiB/s 570410 IOPS
[(1, 715, 183271), (2, 1142, 292576), (3, 1571, 402287), (4, 1785, 457213), (5, 1873, 479614), (6, 2094, 536154), (7, 2185, 559578), (8, 2228, 570410)]

Checklist

Please feel free to remove inapplicable items for your PR.

Changes

dgl-bot commented 3 months ago

To trigger regression tests:

dgl-bot commented 3 months ago

Commit ID: abba7f637a7291cf4a4d7fee59a34d2535390aec

Build ID: 1

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: bfe1981dcf573d742ff522e6be01823cb5c27abf

Build ID: 2

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 45d35227bc31b747bc44c97aef3bd3f7acf012d6

Build ID: 3

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 0fb9a5b210d17b52205221a2d69f009d7af9f393

Build ID: 4

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 82f10136c3416ae480e30aa023e1774dc8e84d4d

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: a60d97941164a57397947e96e2f7c5db98651b76

Build ID: 6

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 2a858db8bf632e3db783e1b4165b44042cda6a8d

Build ID: 7

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 01b0b8407925d61c0907c80d006b0ba4df847c3a

Build ID: 8

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 8d9c8df67c33927c6e20612ac53d3d63a8803b32

Build ID: 9

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e8ee557c1a05df3e491a78660de7bef4a957139a

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ee3656821ac6c4c6edf626cd2d5c57abfc4c0163

Build ID: 11

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ca8da895f52b48eb340b65da409e8b7e150785a5

Build ID: 12

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 358ab1f14258222cbb0841cbd553882157af4cfe

Build ID: 13

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e7d0b590df13c5a4a26467c925c1dba2d251c46d

Build ID: 14

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: c7f9a06f8b5236e2de71919deb0ad876717cb407

Build ID: 15

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 2b032a9836b0132034a887f9e51a7cbd7159a42b

Build ID: 16

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 527cc2d49bc5abf22c98abc0228434568a643319

Build ID: 17

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: f0cf75f69e8d4c244a472a6333dd45b3bf1f8ecd

Build ID: 18

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: a8a1234961de6e0c5b0cfd971909b24bc1143e24

Build ID: 19

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 17606242a9a4e7fb3391c460c31651ddba0d95fd

Build ID: 20

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 0c29a19f4530578cc30261c8ca0bb3b21ea36905

Build ID: 21

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e9e9dd78718f7418933feadef9059932a28c2287

Build ID: 22

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 4c877c611ece9373d76ac80c51078226d7fbc596

Build ID: 23

Status: ❌ CI test failed in Stage [Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 3 months ago

@dgl-bot

dgl-bot commented 3 months ago

Commit ID: 4c877c611ece9373d76ac80c51078226d7fbc596

Build ID: 24

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e9e9dd78718f7418933feadef9059932a28c2287

Build ID: 25

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 1affb480c5118bf4190a05a3639122f13000f15a

Build ID: 26

Status: ❌ CI test failed in Stage [Torch CPU Example test].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e4f93ae39e50bccc3add5d5543a125aca7d20417

Build ID: 27

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: c84a7f0a13c51d43f52a01ebf50e7d6b64cb4aec

Build ID: 28

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: fcb09ed36b6800cc994ab6556dde2985463206e1

Build ID: 29

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 4167fa339f3986b72ed871f033467248b035e09a

Build ID: 30

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 68b75661a829a875e2da6f93f3d01e6283b055ca

Build ID: 31

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 042df7debdfbe00d61f838f6b4ab172e551222dc

Build ID: 32

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e596c721a01d6dabdcdfffd3f50f4627d6709e78

Build ID: 33

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 6c5c03590254f7d676cdedf171fb9761a69835df

Build ID: 34

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ba7c89bda3077c7da1e07767885d4e1f3784317e

Build ID: 35

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 0b2a630cb9914dbe36bba918074b0f68621a50e8

Build ID: 36

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ebf0e2b3215f0babd44c37e09ac619ef7d3dbdac

Build ID: 37

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link