[GraphBolt] Check return values of liburing calls and refactor.

mfbalin commented 3 months ago

Description

In the old code, we were not checking the return value of the results. If the reads are short (requested N bytes but it reads M < N bytes), then we have to continue from where it left off and submit another request. Now, we can write a regression test and measure speed and after speed is measured, check correctness without affecting timing.

Regression test:

import os
import tempfile
from functools import partial
import time

import torch
import dgl.graphbolt as gb

assert_equal = partial(torch.testing.assert_close, rtol=0, atol=0)

def to_on_disk_numpy(test_dir, name, t):
    path = os.path.join(test_dir, name + ".npy")
    gb.numpy_save_aligned(path, t.numpy())
    return path

def test_index_select_throughput_and_iops(shape, dtype, indices, num_threads_list):
    tensor = torch.randint(0, 127, shape, dtype=dtype)

    skip_first = 10

    results = []
    IOPSs = []

    with tempfile.TemporaryDirectory() as test_dir:
        path = to_on_disk_numpy(test_dir, "tensor", tensor)

        for num_threads in num_threads_list:
            feature = gb.DiskBasedFeature(path=path, num_threads=num_threads)

            throughput_sum = 0
            iops_sum = 0

            for i, idx in enumerate(indices):
                start = time.time()
                result = feature.read(idx)
                duration = time.time() - start
                assert_equal(result, tensor[idx])

                if i >= skip_first:
                    throughput_sum += result.nbytes / duration
                    iops_sum += idx.numel() / duration

            throughput = throughput_sum / (len(indices) - skip_first)
            iops = iops_sum / (len(indices) - skip_first)
            print(num_threads, int(throughput / (2 ** 20)), "MiB/s", int(iops), "IOPS")
            results.append(throughput)
            IOPSs.append(iops)

    return results, IOPSs

shape = [2500000, 4096]
dtype = torch.int8
batch_size = 100000
indices = [torch.randint(0, shape[0], [batch_size], dtype=torch.int32) for _ in range(25)]
num_threads_list = list(range(1, 9))

throughputs, IOPSs = test_index_select_throughput_and_iops(shape, dtype, indices, num_threads_list)
print(list((num_threads, int(throughput / (2 ** 20)), int(iops)) for num_threads, throughput, iops in zip(num_threads_list, throughputs, IOPSs)))

Benchmark results with 4K byte feature dimension:

(venv) mfbalin@BALIN-PC:~/dgl-1$ python graphbolt/benchmarks/disk_based_feature.py
1 715 MiB/s 183271 IOPS
2 1142 MiB/s 292576 IOPS
3 1571 MiB/s 402287 IOPS
4 1785 MiB/s 457213 IOPS
5 1873 MiB/s 479614 IOPS
6 2094 MiB/s 536154 IOPS
7 2185 MiB/s 559578 IOPS
8 2228 MiB/s 570410 IOPS
[(1, 715, 183271), (2, 1142, 292576), (3, 1571, 402287), (4, 1785, 457213), (5, 1873, 479614), (6, 2094, 536154), (7, 2185, 559578), (8, 2228, 570410)]

Checklist

Please feel free to remove inapplicable items for your PR.

[ ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
[ ] I've leverage the tools to beautify the python and c++ code.
[ ] The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
[ ] All changes have test coverage
[ ] Code is well-documented
[ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
[ ] Related issue is referred in this PR
[ ] If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot commented 3 months ago

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch]; For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot commented 3 months ago

Commit ID: abba7f637a7291cf4a4d7fee59a34d2535390aec

Build ID: 1

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: bfe1981dcf573d742ff522e6be01823cb5c27abf

Build ID: 2

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 45d35227bc31b747bc44c97aef3bd3f7acf012d6

Build ID: 3

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 0fb9a5b210d17b52205221a2d69f009d7af9f393

Build ID: 4

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 82f10136c3416ae480e30aa023e1774dc8e84d4d

Build ID: 5

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: a60d97941164a57397947e96e2f7c5db98651b76

Build ID: 6

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 2a858db8bf632e3db783e1b4165b44042cda6a8d

Build ID: 7

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 01b0b8407925d61c0907c80d006b0ba4df847c3a

Build ID: 8

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 8d9c8df67c33927c6e20612ac53d3d63a8803b32

Build ID: 9

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e8ee557c1a05df3e491a78660de7bef4a957139a

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ee3656821ac6c4c6edf626cd2d5c57abfc4c0163

Build ID: 11

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ca8da895f52b48eb340b65da409e8b7e150785a5

Build ID: 12

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 358ab1f14258222cbb0841cbd553882157af4cfe

Build ID: 13

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e7d0b590df13c5a4a26467c925c1dba2d251c46d

Build ID: 14

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: c7f9a06f8b5236e2de71919deb0ad876717cb407

Build ID: 15

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 2b032a9836b0132034a887f9e51a7cbd7159a42b

Build ID: 16

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 527cc2d49bc5abf22c98abc0228434568a643319

Build ID: 17

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: f0cf75f69e8d4c244a472a6333dd45b3bf1f8ecd

Build ID: 18

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: a8a1234961de6e0c5b0cfd971909b24bc1143e24

Build ID: 19

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 17606242a9a4e7fb3391c460c31651ddba0d95fd

Build ID: 20

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 0c29a19f4530578cc30261c8ca0bb3b21ea36905

Build ID: 21

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e9e9dd78718f7418933feadef9059932a28c2287

Build ID: 22

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 4c877c611ece9373d76ac80c51078226d7fbc596

Build ID: 23

Status: ❌ CI test failed in Stage [Torch CPU Unit test].

Report path: link

Full logs path: link

mfbalin commented 3 months ago

@dgl-bot

dgl-bot commented 3 months ago

Commit ID: 4c877c611ece9373d76ac80c51078226d7fbc596

Build ID: 24

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e9e9dd78718f7418933feadef9059932a28c2287

Build ID: 25

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 1affb480c5118bf4190a05a3639122f13000f15a

Build ID: 26

Status: ❌ CI test failed in Stage [Torch CPU Example test].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e4f93ae39e50bccc3add5d5543a125aca7d20417

Build ID: 27

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: c84a7f0a13c51d43f52a01ebf50e7d6b64cb4aec

Build ID: 28

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: fcb09ed36b6800cc994ab6556dde2985463206e1

Build ID: 29

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 4167fa339f3986b72ed871f033467248b035e09a

Build ID: 30

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 68b75661a829a875e2da6f93f3d01e6283b055ca

Build ID: 31

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 042df7debdfbe00d61f838f6b4ab172e551222dc

Build ID: 32

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: e596c721a01d6dabdcdfffd3f50f4627d6709e78

Build ID: 33

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 6c5c03590254f7d676cdedf171fb9761a69835df

Build ID: 34

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ba7c89bda3077c7da1e07767885d4e1f3784317e

Build ID: 35

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: 0b2a630cb9914dbe36bba918074b0f68621a50e8

Build ID: 36

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 3 months ago

Commit ID: ebf0e2b3215f0babd44c37e09ac619ef7d3dbdac

Build ID: 37

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dmlc / dgl