no need to use SANDCASTLE=1 in custom easyblock for PyTorch for versions >= 2.3.0, it actually causes more problems than it solves

akesandgren commented 6 months ago

(created using eb --new-pr)

akesandgren commented 6 months ago

Test report by @akesandgren

Overview of tested easyconfigs (in order)

FAIL (build issue) PyTorch-2.3.0-foss-2023b.eb (partial log available at https://gist.github.com/akesandgren/a929f262c386a2dfd9fa329d78dcfc8d)

Build succeeded for 0 out of 1 (1 easyconfigs in total) b-cn1603.hpc2n.umu.se - Linux Ubuntu 22.04, x86_64, AMD EPYC 7313 16-Core Processor, 1 x NVIDIA NVIDIA A100 80GB PCIe, 550.78, Python 3.10.12 See https://gist.github.com/akesandgren/af2d9642341387205d3fb8a2183f6671 for a full test report.

Flamefire commented 6 months ago

I just checked on such instance of a failure caused by that:

==========================short test summary info ==================
FAILED [0.0003s] export/test_lift_unlift.py::TestLift::test_duplicate_constant_access - OSError: /caffe2/test/cpp/jit:test_custom_class_registrations: cannot open shared object file: No such file or directory
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!
========================= 1 failed, 2 rerun in 0.07s ===================

That is caused by https://github.com/pytorch/pytorch/blob/6c503f1dbbf9ef1bf99f19f0048c287f419df600/test/export/test_lift_unlift.py#L151-L154

However that envvar also causes a large number of tests to be skipped. I created an upstream PR to allow us to skip those tests without setting the env var in the future.

For now I think it is better to grep for IS_SANDCASTLE in the code for places where it influences the test instead of skipping it and replace it by False. Those are much rarer then the amount of skips (on 2.3.0: 113 hits for IS_SANDCASTLE, 20 for if .*IS_SANDCASTLE, 50 for (if|or|and) .*IS_SANDCASTLE)

easybuilders / easybuild-easyblocks

no need to use SANDCASTLE=1 in custom easyblock for PyTorch for versions >= 2.3.0, it actually causes more problems than it solves #3330

Overview of tested easyconfigs (in order)