Closed h-vetinari closed 4 months ago
xlintstz_ztest_in
)Still failing with MKL 2024.0.0 on linux:
The following tests FAILED:
23 - LAPACK-xlintstd_dtest_in (Failed)
44 - LAPACK-xlintstc_ctest_in (Failed)
65 - LAPACK-xlintstz_ztest_in (Failed)
and windows:
The following tests FAILED:
22 - LAPACK-xlintstd_dtest_in (Failed)
57 - LAPACK-xlintstz_ztest_in (Failed)
(osx got dropped, see #62)
@h-vetinari, I cloned blas-feedstock repository and then ran build-locally.py with option 2 (Linux and MKL). I got 100% LAPACK tests passing. The tests used MKL 2024.0 (by default). Not sure how you got these errors above? What are the specific steps to reproduce these errors?
Hey @milubin, thanks a lot for taking a look! The update to 3.10.1 hasn't been merged to the blas-feedstock yet, exactly because of the breaking changes. Could you check out https://github.com/conda-forge/blas-feedstock/pull/96 and try again please?
@h-vetinari, I looked up this pull request and it seems it points to the bump branch in your own repo: git clone --recursive https://github.com/h-vetinari/blas-feedstock/ -b bump with this repo, I see 2 tests failing on Linux: The following tests FAILED: 23 - LAPACK-xlintstd_dtest_in (Failed) 65 - LAPACK-xlintstz_ztest_in (Failed) Is that the correct repo? I don't see 44 failing.
Yes, the pull request originates from a branch in my repo. The failures that I posted are from the CI for that PR, I'm glad you could already reproduce some failures!
Given that all failures appear in xlintst*
, I'm guessing that solving the ones you could reproduce already might also make progress on the one you didn't reproduce (maybe it's not fully deterministic...)
@h-vetinari, I filed an internal ticket. The main distribution uses LAPACK 3.9.0 -- is that the latest version that passes? We would like to understand if oneMKL 2024.1 (which will be publicly available soon) works with LAPACK 3.10.1. Have you tried LAPACK 3.11.0 or 3.12.0?
Thank you very much!
Indeed 3.9.0 is the latest version that passes (across all BLAS/LAPACK implementations). We didn't try 3.10.0 as your colleagues here said MKL would skip it and go straight for compatibility with 3.10.1. Since then we have been blocked from upgrading further by the test failures that this issue is about.
I can build LAPACK 3.11.0 so that the tests in the blas meta-package can be run if that helps.
@h-vetinari, if you build with 3.11.0 that would help. The same to oneMKL 2024.1 when it becomes available so we can compare notes.
Alright, there's now a PR you can use to test against LAPACK 3.11. It currently has a lot of failures against MKL 2024.0; if you replace the three occurrences of "2024.0" in meta.yaml
with "2024.1", you can run it against your newer builds too[^1]
[^1]: provided you modify the channel_sources
to point to wherever those artefacts live, if they haven't been published to conda-forge yet
@h-vetinari, I tested against 3.11, and all tests failed except 86-101. I only have a "typical" engineering build of oneMKL 2024.1 for use on the local system. The change in meta.yaml assumes that 2024.1 is available in packaged for conda-forge form. I guess it can be possible to hack your distribution to point to another oneMKL repository on my local system (in this case, 2024.1) -- I just have not learned how to do this yet.
@milubin, I was certain that I had commented already on your question, but it seems it didn't get transmitted successfully - sorry. In any case, now that #65 is merged, it should be easier to test the two LAPACK upgrade PRs against 2024.1
@h-vetinari, 3.10.1 and 3.11 are both failing with MKL 2024.1 with the same tests 23 and 65 (as before). I updated the internal ticket. I'll keep you posted.
Addition: 2024.0 failed a lot with 3.11 (86 failures out of 100), while 2024.1 failed only with 23 and 65 (with 3.11). Can you confirm that?
I retested with MKL 2024.1, and still get the following:
The following tests FAILED:
23 - LAPACK-xlintstd_dtest_in (Failed)
44 - LAPACK-xlintstc_ctest_in (Failed)
65 - LAPACK-xlintstz_ztest_in (Failed)
on linux and
The following tests FAILED:
22 - LAPACK-xlintstd_dtest_in (Failed)
57 - LAPACK-xlintstz_ztest_in (Failed)
on windows.
Addition: 2024.0 failed a lot with 3.11 (86 failures out of 100), while 2024.1 failed only with 23 and 65 (with 3.11). Can you confirm that?
Sorry I missed responding to this. From what I can tell (not looking at individual test failures being subsumed in the ~100 meta-tests), the overall picture stayed the same between 2024.0 and 2024.1
@h-vetinari, the fix should be in MKL 2024.2, which was just released. However, I don't see 2024.2 available yet in conda-forge. I will ask internally.
Happy to report that this is indeed fixed by 2024.2! Thank you!
While trying to update to LAPACK 3.10.1 & MKL 2023.1, we encounter errors in the LAPACK test suite. The ones on linux are a regression that seems to affect all our current builds (investigating; help appreciated), while on osx & windows, 1-2 tests fail:
OSX:
Windows: