Open mfbalin opened 5 months ago
I am assigning you to this issue @Rhett-Ying, feel free to unassign or assign someone else.
All distributed tests are temporarily skipped in https://github.com/dmlc/dgl/pull/7490
The tests don't seem to fail anymore. Does this mean the problems are solved, or are some tests skipped?
The io_uring tests seems to fail spuriously: https://github.com/dmlc/dgl/pull/7537#issuecomment-2238237746 https://github.com/dmlc/dgl/pull/7540#issuecomment-2237539648
Hopefully #7542 solved this issue.
I might temporarily disable this test to figure out what may be the issue if this keeps repeating.
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g4-idtype1] PASSED [ 5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g5-idtype0] PASSED [ 5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g5-idtype1] PASSED [ 5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g6-idtype0] PASSED [ 5%]
tests/python/pytorch/geometry/test_geometry.py::test_edge_coarsening[False-False-g6-idtype1] PASSED [ 5%]
tests/python/pytorch/graphbolt/impl/test_basic_feature_store.py::test_basic_feature_store_homo PASSED [ 5%]
tests/python/pytorch/graphbolt/impl/test_basic_feature_store.py::test_basic_feature_store_hetero PASSED [ 5%]
tests/python/pytorch/graphbolt/impl/test_basic_feature_store.py::test_basic_feature_store_errors PASSED [ 5%]
tests/python/pytorch/graphbolt/impl/test_disk_based_feature_store.py::test_disk_based_feature Sending interrupt signal to process
Terminated
script returned exit code 143
The io_uring and CPUCachedFeature based bugs have been resolved.
🔨Work Item
IMPORTANT:
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
Example failures: https://github.com/dmlc/dgl/pull/7470#issuecomment-2190675466, https://github.com/dmlc/dgl/pull/7470#issuecomment-2190727402, https://github.com/dmlc/dgl/pull/7475#issuecomment-2192441404, https://github.com/dmlc/dgl/pull/7485#issuecomment-2192788339
If some of the tests are known to fail randomly, they should be disabled until the issues are resolved.