Closed mfbalin closed 1 week ago
To trigger regression tests:
@dgl-bot run [instance-type] [which tests] [compare-with-branch]
;
For example: @dgl-bot run g4dn.4xlarge all dmlc/master
or @dgl-bot run c5.9xlarge kernel,api dmlc/master
@frozenbugs let's land this PR and the followup PR before we release DGL 2.3. That way, we can say the new release has this major feature.
@dgl-bot
@dgl-bot
@dgl-bot
@dgl-bot
@dgl-bot
@Rhett-Ying some test always ends up failing, the CI is too unstable.
@dgl-bot
@Rhett-Ying CI failure output:
tests/distributed/test_mp_dataloader.py::test_edge_dataloader_homograph[False-self-False-0] Fatal Python error: Segmentation fault
Thread 0x00007fb4b8fdf700 (most recent call first):
File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 324 in wait
File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 607 in wait
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
File "/opt/conda/envs/pytorch-ci/lib/python3.10/threading.py", line 973 in _bootstrap
Current thread 0x00007fb4e5ce6800 (most recent call first):
File "/opt/conda/envs/pytorch-ci/lib/python3.10/codecs.py", line 322 in decode
File "/home/ubuntu/jenkins/workspace/dgl_PR-7470@2/python/dgl/partition.py", line 271 in get_peak_mem
File "/home/ubuntu/jenkins/workspace/dgl_PR-7470@2/python/dgl/distributed/partition.py", line 921 in partition_graph
File "/home/ubuntu/jenkins/workspace/dgl_PR-7470@2/tests/distributed/test_mp_dataloader.py", line 770 in check_dataloader
File "/home/ubuntu/jenkins/workspace/dgl_PR-7470@2/tests/distributed/test_mp_dataloader.py", line 914 in test_edge_dataloader_homograph
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/python.py", line 162 in pytest_pyfunc_call
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/python.py", line 1627 in runtest
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 241 in <lambda>
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 240 in call_and_report
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 135 in runtestprotocol
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 339 in _main
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 285 in wrap_session
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/config/__init__.py", line 178 in main
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/_pytest/config/__init__.py", line 206 in console_main
File "/opt/conda/envs/pytorch-ci/lib/python3.10/site-packages/pytest/__main__.py", line 7 in <module>
File "/opt/conda/envs/pytorch-ci/lib/python3.10/runpy.py", line 86 in _run_code
File "/opt/conda/envs/pytorch-ci/lib/python3.10/runpy.py", line 196 in _run_module_as_main
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, dgl._ffi._cy3.core, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, psutil._psutil_linux, psutil._psutil_posix, pyarrow.lib, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, yaml._yaml (total: 115)
tests/scripts/task_distributed_test.sh: line 37: 638 Segmentation fault (core dumped) python3 -m pytest -v --capture=tee-sys --junitxml=pytest_distributed.xml --durations=100 tests/distributed/*.py
FAIL: distributed
@mfbalin pls skip it. It's known issue
@mfbalin pls skip it. It's known issue
I can not skip it. I don't have the permissions to merge into master when the CI does not clear.
another test failure:
+ bash tests/scripts/task_example_test.sh cpu
============================= test session starts ==============================
platform linux -- Python 3.10.14, pytest-8.2.0, pluggy-1.5.0 -- /opt/conda/envs/pytorch-ci/bin/python3
cachedir: .pytest_cache
rootdir: /home/ubuntu/jenkins/workspace/dgl_PR-7470
configfile: pyproject.toml
collecting ... collected 12 items
tests/examples/test_sampling_examples.py::test_node_classification PASSED [ 8%]
tests/examples/test_sampling_examples.py::test_link_prediction PASSED [ 16%]
tests/examples/test_sparse_examples.py::test_gcn PASSED [ 25%]
tests/examples/test_sparse_examples.py::test_gcnii PASSED [ 33%]
tests/examples/test_sparse_examples.py::test_appnp PASSED [ 41%]
tests/examples/test_sparse_examples.py::test_c_and_s PASSED [ 50%]
tests/examples/test_sparse_examples.py::test_gat PASSED [ 58%]
tests/examples/test_sparse_examples.py::test_hgnn PASSED [ 66%]
tests/examples/test_sparse_examples.py::test_hypergraphatt PASSED [ 75%]
tests/examples/test_sparse_examples.py::test_sgc PASSED [ 83%]
tests/examples/test_sparse_examples.py::test_sign FAILED [ 91%]
tests/examples/test_sparse_examples.py::test_twirls PASSED [100%]
=================================== FAILURES ===================================
__________________________________ test_sign ___________________________________
def test_sign():
script = os.path.join(EXAMPLE_ROOT, "sign.py")
out = subprocess.run(["python", str(script)], capture_output=True)
assert (
out.returncode == 0
), f"stdout: {out.stdout.decode('utf-8')}\nstderr: {out.stderr.decode('utf-8')}"
stdout = out.stdout.decode("utf-8")
> assert float(stdout[-5:]) > 0.7
E AssertionError: assert 0.697 > 0.7
E + where 0.697 = float('.697\n')
tests/examples/test_sparse_examples.py:101: AssertionError
- generated xml file: /home/ubuntu/jenkins/workspace/dgl_PR-7470/pytest_backend.xml -
============================ slowest 100 durations =============================
34.74s call tests/examples/test_sparse_examples.py::test_twirls
21.53s call tests/examples/test_sparse_examples.py::test_gcnii
9.20s call tests/examples/test_sparse_examples.py::test_hypergraphatt
8.06s call tests/examples/test_sampling_examples.py::test_node_classification
7.80s call tests/examples/test_sampling_examples.py::test_link_prediction
6.16s call tests/examples/test_sparse_examples.py::test_hgnn
3.27s call tests/examples/test_sparse_examples.py::test_appnp
3.22s call tests/examples/test_sparse_examples.py::test_gat
3.18s call tests/examples/test_sparse_examples.py::test_gcn
2.82s call tests/examples/test_sparse_examples.py::test_sign
2.61s call tests/examples/test_sparse_examples.py::test_sgc
2.55s call tests/examples/test_sparse_examples.py::test_c_and_s
(24 durations < 0.005s hidden. Use -vv to show these durations.)
=========================== short test summary info ============================
FAILED tests/examples/test_sparse_examples.py::test_sign - AssertionError: assert 0.697 > 0.7
+ where 0.697 = float('.697\n')
=================== 1 failed, 11 passed in 105.21s (0:01:45) ===================
FAIL: sparse examples on cpu
I've just made a change on CI and rebased this PR. let's see if it works well now. If not, I will merge it if other CI tests pass.
I've just made a change on CI and rebased this PR. let's see if it works well now. If not, I will merge it if other CI tests pass.
The CI already passed: https://github.com/dmlc/dgl/pull/7470#issuecomment-2190496941 The only change I made is to improve a comment in the code. I would appreciate it if you merged it now.
Description
Follow up PR will add the dataloader logic so that the cache can be utilized for faster GPU sampling.
Checklist
Please feel free to remove inapplicable items for your PR.
Changes