Closed isuruf closed 4 years ago
For packages using scipy, add the following,
skip: True # [python_impl=="pypy" and ppc64le]
and do a rerender.
Hey @conda-forge/pypy3-6 @conda-forge/pypy3-5 @conda-forge/scipy.
Since the pypy+ppc64le build was skipped due to the mentioned segfault, we're wondering how this should be handled in libraries downstream of scipy (e.g. conda-forge/osqp-feedstock#22, conda-forge/ecos-feedstock#9, + about a zillion more ;-)). Quoting myself from here at @isuruf's request:
Is it in the interest of the migration to skip all downstream dependencies of scipy on pypy+ppc64le? Wouldn't this then need another migration to catch up? I'm happy to either skip the build or wait a bit to see íf the scipy build can be fixed.
cc @conda-forge/help-pypy
Also @conda-forge/bot
We still do not have a good solution for recipes that cannot be built and block a full subtree of the migration.
I am a bit lost in the PRs and CI runs. Where is a record of the scipy segfault? I have access to a gcc buildfarm ppc64le machine, maybe I can try to reproduce it. Do we know if this is using any BLAS acceleration and what version of gcc is in play?
Log is at https://travis-ci.com/github/conda-forge/scipy-feedstock/jobs/306968829 This is using the reference lapack implementation (pure fortran). gcc 7.3.0
Wait, this is gcc 8.2.0
On second thought that might be the problem. Retrying with -fno-optimize-sibling-calls
Looking at the segfault, it is in _pocketfft
. In compilation there is a warning note: the ABI of passing aggregates with 16-byte alignment has changed in GCC 5
for the function 'void pocketfft::detail::PM(T&, T&, T, T) [with T = pocketfft::detail::cmplx<long double>]'
. Maybe there is a stack corruption somewhere that PyPy exposes?
Looking up that error, one of the top search results is in the discussion on numpy https://github.com/numpy/numpy/issues/10281. I don't think PyPy knows anything about long double
, much less on ppc64le, but maybe there is a problem with _pocketfft
somewhere making assumptions about the size of long double
?
@peterbell10 do you know the answer to the above pocketfft question?
pocketfft shouldn't make any assumptions. The SciPy CI on its own covers 8, 12 and 16 byte long double
variants without issue. That includes ppcle with CPython.
@isuruf could you add -v to the scipy test run so we can see exactly which test is causing the segfault? On x86 in NumPy we get a nice C stacktrace when there is a segfault, is there something like that on ppc64le? Maybe a -g compiler flag?
That includes ppcle with CPython.
Hmm, maybe we should re-enable the PyPy CI run(s) on scipy/scipy, although they were annoying both because PyPy is slower and they were flaky.
Here's another run that shows which tests crashed. https://travis-ci.com/github/conda-forge/scipy-feedstock/jobs/302140872#L13359-L13418
Interesting: the crashes seem to mostly be in fft/_pocketfft/tests/test_real_transforms.py
, although the error reporting terminated due to too many errors.
I can reproduce a segfault, maybe the same one. It seems to be an actual PyPy bug, since running with --jit off
seems to pass the tests. It might be some interaction between pybind11's generated code and the JIT.
@mattip, is there a way to set --jit off
using a env variable so that I can run pytest and see if that helps?
It seems not. You could alias python
to pypy --jit off
if you really have alot of time, it is very slow that way. We found a failure in PyPy and I am testing the fix now.
Still segfaulting after a fix in PyPy. It is strange the _pocketfft tests are crashing here when the same tests pass in numpy.
I can confirm that turning off the jit and running the pocketfft test suite fixes the segfault with the conda package. I'll run the full test suite on a OSU OSL machine and report back.
It is strange the _pocketfft tests are crashing here when the same tests pass in numpy.
It's not the same code - NumPy version is C, SciPy version is more recent C++
PyPy found a problem, the fix is specific to JITted code on ppc. The fix will be part of tomorrow's nightly and also part of 7.3.1
@mattip How far out is 7.3.1?
@conda-forge/help-pypy @conda-forge/bot I'm guessing it makes sense to wait with the migration of scipy-dependent packages then? Or do you want to skip ppc to get pypy (packages) to users faster on other platforms?
hopefully within two weeks
then we wait I think
Or we can port the fix to 7.3.0 if anyone is interested.
Thanks @mattip. I couldn't run the full test suite on travis due to limits (40 mins for cpython, >50 minutes for pypy). I did run the full test suite locally on a power8 machine and all tests passed with reference lapack. (OpenBLAS is another story)
Not sure if this is a packaging issue or an upstream issue