dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.23k stars 2.99k forks source link

[GraphBolt][io_uring] Refactor and enable tests #7506

Closed mfbalin closed 1 week ago

mfbalin commented 2 weeks ago

Description

  1. We were not able to build GraphBolt with liburing support because the USE_LIBURING option was not being passed to graphbolt.
  2. I realized that liburing is not used anywhere else except GraphBolt so I moved its configuration steps into the GraphBolt cmake file.
  3. There was an unnecessary copy at the end and the use of unsafe allocation such as malloc and free, which are eliminated now. Changed to use the buffers from torch instead.
  4. Enabled the tests on systems that support io_uring.
  5. Made the disk read operation async and return a future, it also takes num_threads argument now that determines the maximum # threads that will read from disk. For now, there is no async read operation exposed in python DiskBasedFeature yet, it will be introduced in followup PRs.
  6. Eliminated a few memory leaks and switched to safer and modern variants instead. (no manual free or delete required now.)

Checklist

Please feel free to remove inapplicable items for your PR.

Changes

dgl-bot commented 2 weeks ago

To trigger regression tests:

dgl-bot commented 2 weeks ago

Commit ID: 67a6dd454acd8a8a9f9765c40be73ac958250b61

Build ID: 1

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 2 weeks ago

Commit ID: a6a08175d32652663ef7c248f88be3c76c4453ee

Build ID: 2

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 2 weeks ago

Commit ID: fd4e5f86d90ddb255fda7d15519ca9f063a5847f

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

mfbalin commented 2 weeks ago

@Rhett-Ying Did the tests get enabled for our Linux CI or not, I can't parse the output of the CI.

EDIT: I forgot to enable building by default.

dgl-bot commented 2 weeks ago

Commit ID: ad3804ce90f2faea5979583b9aa8d0c23285d311

Build ID: 4

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

mfbalin commented 2 weeks ago

The tests passed on Linux and were skipped on Windows. I don't have a way to test whether it will be skipped on a system that does not support io_uring though. I am hoping that it works.

dgl-bot commented 1 week ago

Commit ID: f6b5d34d47d5c9d2ca936624e07a7f82b85f9f87

Build ID: 5

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 91905731824f170811fd17617fa4fdffc2309c94

Build ID: 6

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: e5e1316aa790cf8bd8cdff8316cba9ac409219a4

Build ID: 7

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 2efbbed5bc1e44b2f6f39a982da1457d674f90df

Build ID: 8

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 6771e7fc802e75a5a6cd1dabbd39c8cdbd11fd2d

Build ID: 9

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: ef18d3cfa5af3f550b07a83f6f0c315b8e991022

Build ID: 10

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 41172f2dd5e8d048a5a92eb6b39386d27d03bb08

Build ID: 11

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: a962b5b3bebe462bb572ab3c3869720514677850

Build ID: 12

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 8060a833e081cae35d8ceefe6ff620c62996741b

Build ID: 13

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: 6ccc993f30210d0aa6dbfbd3e9bcceecba4cc5f6

Build ID: 14

Status: ❌ CI test failed in Stage [CPU Build].

Report path: link

Full logs path: link

dgl-bot commented 1 week ago

Commit ID: d54017b32c797e9848222be237cfee297cad23da

Build ID: 15

Status: ❌ CI test failed in Stage [Torch GPU Example test].

Report path: link

Full logs path: link

mfbalin commented 1 week ago

@Rhett-Ying

=================================== FAILURES ===================================
__________________________________ test_sign ___________________________________

    def test_sign():
        script = os.path.join(EXAMPLE_ROOT, "sign.py")
        out = subprocess.run(["python", str(script)], capture_output=True)
        assert (
            out.returncode == 0
        ), f"stdout: {out.stdout.decode('utf-8')}\nstderr: {out.stderr.decode('utf-8')}"
        stdout = out.stdout.decode("utf-8")
>       assert float(stdout[-5:]) > 0.7
E       AssertionError: assert 0.689 > 0.7
E        +  where 0.689 = float('.689\n')

tests/examples/test_sparse_examples.py:101: AssertionError
- generated xml file: /home/ubuntu/jenkins/workspace/dgl_PR-7506/pytest_backend.xml -
============================ slowest 100 durations =============================
25.62s call     tests/examples/test_sparse_examples.py::test_twirls
17.99s call     tests/examples/test_sparse_examples.py::test_hgnn
17.00s call     tests/examples/test_sparse_examples.py::test_hypergraphatt
13.43s call     tests/examples/test_sampling_examples.py::test_node_classification
13.05s call     tests/examples/test_sparse_examples.py::test_gcnii
8.61s call     tests/examples/test_sampling_examples.py::test_link_prediction
7.34s call     tests/examples/test_sparse_examples.py::test_gat
5.64s call     tests/examples/test_sparse_examples.py::test_gcn
5.42s call     tests/examples/test_sparse_examples.py::test_appnp
5.02s call     tests/examples/test_sparse_examples.py::test_c_and_s
4.79s call     tests/examples/test_sparse_examples.py::test_sign
4.69s call     tests/examples/test_sparse_examples.py::test_sgc

(24 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================== short test summary info ============================
FAILED tests/examples/test_sparse_examples.py::test_sign - AssertionError: assert 0.689 > 0.7
 +  where 0.689 = float('.689\n')
=================== 1 failed, 11 passed in 128.68s (0:02:08) ===================
FAIL: sparse examples on gpu
mfbalin commented 1 week ago

@dgl-bot

Rhett-Ying commented 1 week ago

@Rhett-Ying

=================================== FAILURES ===================================
__________________________________ test_sign ___________________________________

    def test_sign():
        script = os.path.join(EXAMPLE_ROOT, "sign.py")
        out = subprocess.run(["python", str(script)], capture_output=True)
        assert (
            out.returncode == 0
        ), f"stdout: {out.stdout.decode('utf-8')}\nstderr: {out.stderr.decode('utf-8')}"
        stdout = out.stdout.decode("utf-8")
>       assert float(stdout[-5:]) > 0.7
E       AssertionError: assert 0.689 > 0.7
E        +  where 0.689 = float('.689\n')

tests/examples/test_sparse_examples.py:101: AssertionError
- generated xml file: /home/ubuntu/jenkins/workspace/dgl_PR-7506/pytest_backend.xml -
============================ slowest 100 durations =============================
25.62s call     tests/examples/test_sparse_examples.py::test_twirls
17.99s call     tests/examples/test_sparse_examples.py::test_hgnn
17.00s call     tests/examples/test_sparse_examples.py::test_hypergraphatt
13.43s call     tests/examples/test_sampling_examples.py::test_node_classification
13.05s call     tests/examples/test_sparse_examples.py::test_gcnii
8.61s call     tests/examples/test_sampling_examples.py::test_link_prediction
7.34s call     tests/examples/test_sparse_examples.py::test_gat
5.64s call     tests/examples/test_sparse_examples.py::test_gcn
5.42s call     tests/examples/test_sparse_examples.py::test_appnp
5.02s call     tests/examples/test_sparse_examples.py::test_c_and_s
4.79s call     tests/examples/test_sparse_examples.py::test_sign
4.69s call     tests/examples/test_sparse_examples.py::test_sgc

(24 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================== short test summary info ============================
FAILED tests/examples/test_sparse_examples.py::test_sign - AssertionError: assert 0.689 > 0.7
 +  where 0.689 = float('.689\n')
=================== 1 failed, 11 passed in 128.68s (0:02:08) ===================
FAIL: sparse examples on gpu

This issue happens in rare. just re-run it.

dgl-bot commented 1 week ago

Commit ID: d54017b32c797e9848222be237cfee297cad23da

Build ID: 16

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link