dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.57k stars 3.02k forks source link

[CI] Spurious test failures #7481

Open mfbalin opened 5 months ago

mfbalin commented 5 months ago

🔨Work Item

IMPORTANT:

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

Example failures: https://github.com/dmlc/dgl/pull/7470#issuecomment-2190675466, https://github.com/dmlc/dgl/pull/7470#issuecomment-2190727402, https://github.com/dmlc/dgl/pull/7475#issuecomment-2192441404, https://github.com/dmlc/dgl/pull/7485#issuecomment-2192788339

If some of the tests are known to fail randomly, they should be disabled until the issues are resolved.

mfbalin commented 5 months ago

I am assigning you to this issue @Rhett-Ying, feel free to unassign or assign someone else.

Rhett-Ying commented 5 months ago

All distributed tests are temporarily skipped in https://github.com/dmlc/dgl/pull/7490

mfbalin commented 5 months ago

The tests don't seem to fail anymore. Does this mean the problems are solved, or are some tests skipped?

mfbalin commented 4 months ago

The io_uring tests seems to fail spuriously: https://github.com/dmlc/dgl/pull/7537#issuecomment-2238237746 https://github.com/dmlc/dgl/pull/7540#issuecomment-2237539648

Hopefully #7542 solved this issue.

I might temporarily disable this test to figure out what may be the issue if this keeps repeating.

tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g4-idtype1] PASSED [  5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g5-idtype0] PASSED [  5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g5-idtype1] PASSED [  5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g6-idtype0] PASSED [  5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g6-idtype1] PASSED [  5%]
tests/python/pytorch/graphbolt/impl/test_basic_feature_store.py::test_basic_feature_store_homo PASSED [  5%]
tests/python/pytorch/graphbolt/impl/test_basic_feature_store.py::test_basic_feature_store_hetero PASSED [  5%]
tests/python/pytorch/graphbolt/impl/test_basic_feature_store.py::test_basic_feature_store_errors PASSED [  5%]
tests/python/pytorch/graphbolt/impl/test_disk_based_feature_store.py::test_disk_based_feature Sending interrupt signal to process
Terminated
script returned exit code 143
mfbalin commented 4 months ago

https://github.com/dmlc/dgl/pull/7546#issuecomment-2241200927

mfbalin commented 4 months ago

The io_uring and CPUCachedFeature based bugs have been resolved.