cython / cython

The most widely used Python to C compiler
https://cython.org
Apache License 2.0
9.51k stars 1.49k forks source link

[BUG] 3.0.3 hangs in tests on i686 #5768

Open hroncok opened 1 year ago

hroncok commented 1 year ago

Describe the bug

When trying to upgrade the Fedora package from 3.0.2 to 3.0.3 I see consistent hang in tests on i686 (which is our only 32bit architecture).

I plan to bisect the problem later this week, but I decided to open this issue first in case somebody else figures it first.

Code to reproduce the behaviour:

I've attached a complete build log. I haven't yet tried to reproduce this outside of RPM environment.

Expected behaviour

The tests should pass in reasonable time. A complete build of 3.0.3 on x86_64 or 3.0.2 on i686 takes 1.5 hours on our builders. A build of 3.0.3 on i686 hangs for days.

OS

Fedora Linux 40 i686

Python version

3.12.0

Cython version

3.0.3

Additional context

No response

da-woods commented 1 year ago
Doctest: gil_in_var_initialization_tests.test_method_with_error_return ... Fatal Python error: Segmentation fault
Thread 0xf7e2f700 (most recent call first):
  File "<doctest gil_in_var_initialization_tests.test_method_with_error_return[0]>", line 1 in <module>
  File "/usr/lib/python3.12/doctest.py", line 1357 in __run
  File "/usr/lib/python3.12/doctest.py", line 1504 in run
  File "/usr/lib/python3.12/doctest.py", line 2222 in runTest
  File "/usr/lib/python3.12/unittest/case.py", line 589 in _callTestMethod
  File "/usr/lib/python3.12/unittest/case.py", line 634 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1562 in run_test
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1568 in run_forked_test
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1563 in run_doctests
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1549 in run_tests
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1531 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/runner.py", line 240 in run
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 2952 in runtests
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 2674 in runtests_callback
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 125 in worker
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 71 in _launch
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.12/multiprocessing/context.py", line 282 in _Popen
  File "/usr/lib/python3.12/multiprocessing/process.py", line 121 in start
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 329 in _repopulate_pool_static
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 306 in _repopulate_pool
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 215 in __init__
  File "/usr/lib/python3.12/multiprocessing/context.py", line 119 in Pool
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 2515 in main
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 3004 in <module>
Extension modules: cython.cimports.libc.math, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, refnanny, gil_in_var_initialization_tests (total: 16)

That's the relevant part of the build log. (The tests use multiprocessing.Pool and that doesn't provide a way of telling when one of the workers has crashed so it appears to hang. There's no point in leaving it going...)

da-woods commented 1 year ago

I'm trying to work out how I actually get a 32 bit build of Python to test.

There's no need for you to bisect it I think. It's a new test in this release so I know where it was introduced from. The question is if it's a bug in Cython, a bug in the test, or a bug in something else. I suspect most likely a bug in the test

scoder commented 1 year ago

I was just about to release 3.0.4. I'll wait to see if this turns out to be something to fix in that release.

You could try a 32bit Python docker image to test it locally.

da-woods commented 1 year ago

I managed to test this (with 32-bit Opensuse in virtualbox, just because Opensuse is what I use most of the time so I know how to install stuff).

I can't reproduce the issue. I tried with Python 3.11.5 and Python 3.12.0. Current master and 3.0.3 exactly. It runs fine. I'm using gcc 13.2.1.

I've only tried running the specific test via python3 runtests.py -vv gil_in_var. It's possible it crashes in the complete test suite but I don't have time to check that out this evening.

For the moment the easiest thing to do in Fedora is just to exclude the specific test. You can do that by adding the name to tests/bug.txt.

hroncok commented 1 year ago

Running just that one test makes it crash:

+ /usr/bin/python3 runtests.py -vv gil_in_var
Python 3.12.0 (main, Oct  5 2023, 00:00:00) [GCC 13.2.1 20230918 (Red Hat 13.2.1-3)]
Running tests against Cython 3.0.3
Using Cython language level 2.
Backends: c,cpp
runTest (__main__.CythonRunTestCase.runTest)
[-1] compiling (cpp/cy2) and running gil_in_var_initialization_tests ... 
#### 2023-10-17 09:25:02.817490
#### 2023-10-17 09:25:12.822023
test_method_with_error_return (gil_in_var_initialization_tests)
Doctest: gil_in_var_initialization_tests.test_method_with_error_return ... Fatal Python error: Segmentation fault
Thread 0xf68aab40 (most recent call first):
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 2623 in time_stamper
  File "/usr/lib/python3.12/threading.py", line 989 in run
  File "/usr/lib/python3.12/threading.py", line 1052 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1009 in _bootstrap
Thread 0xf7f66700 (most recent call first):
  File "<doctest gil_in_var_initialization_tests.test_method_with_error_return[0]>", line 1 in <module>
  File "/usr/lib/python3.12/doctest.py", line 1357 in __run
  File "/usr/lib/python3.12/doctest.py", line 1504 in run
  File "/usr/lib/python3.12/doctest.py", line 2222 in runTest
  File "/usr/lib/python3.12/unittest/case.py", line 589 in _callTestMethod
  File "/usr/lib/python3.12/unittest/case.py", line 634 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1562 in run_test
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1568 in run_forked_test
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1563 in run_doctests
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1549 in run_tests
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 1531 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/runner.py", line 240 in run
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 2952 in runtests
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 2550 in main
  File "/builddir/build/BUILD/cython-3.0.3/runtests.py", line 3004 in <module>
Extension modules: cython.cimports.libc.math, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, gil_in_var_initialization_tests, refnanny (total: 16)
/var/tmp/rpm-tmp.E9IWo1: line 47:  4153 Segmentation fault      (core dumped) /usr/bin/python3 runtests.py -vv gil_in_var

Full log: build.log.txt

If you are confident this is a problem in the test itself, I'll exclude it. Perhaps this is specific to our build flags?

da-woods commented 1 year ago

My current view is that it's likely an issue with either the test or the compilation environment more than a bug in Cython and so it probably should hold anything up. But I'm definitely not certain of that.

I could try having a look on "Fedora Linux 40 i686" (I suspect it'd be very easy to tell what's wrong in a C++ debugger) but I can't find any evidence of any recent 32 bit version of Fedora to download.

hroncok commented 1 year ago

Fedora does not build i686 kernels anymore. We build for i686 only for the infamous "multilib" case. To get it, you can use e.g. mock (omit the initial podman ... to su - mockbuilder if already on a Fedora system, e.g. in VirtualBox).

$ podman run --rm --privileged -ti fedora:rawhide bash
[root@201a48537c14 /]# dnf install -y mock
...
[root@201a48537c14 /]# useradd mockbuilder
[root@201a48537c14 /]# usermod -a -G mock mockbuilder
[root@201a48537c14 /]# su - mockbuilder

[mockbuilder@201a48537c14 ~]$ mock -r fedora-rawhide-i386 --no-bootstrap-image --no-bootstrap-chroot --init
...
[mockbuilder@201a48537c14 ~]$ mock -r fedora-rawhide-i386 --no-bootstrap-image --no-bootstrap-chroot --install git-core python3-devel python3-setuptools gcc-c++ gdb
...
[mockbuilder@201a48537c14 ~]$ mock -r fedora-rawhide-i386 --no-bootstrap-image --no-bootstrap-chroot --shell --enable-network --unpriv
...
<mock-chroot> sh-5.2$ cd
<mock-chroot> sh-5.2$ git clone https://github.com/cython/cython.git
<mock-chroot> sh-5.2$ cd cython/
<mock-chroot> sh-5.2$ rpm --eval '%set_build_flags'

  CFLAGS="${CFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m32 -march=i686 -mtune=generic -msse2 -mfpmath=sse -mstackrealign -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection }" ; export CFLAGS ; 
  CXXFLAGS="${CXXFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m32 -march=i686 -mtune=generic -msse2 -mfpmath=sse -mstackrealign -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection }" ; export CXXFLAGS ; 
  FFLAGS="${FFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m32 -march=i686 -mtune=generic -msse2 -mfpmath=sse -mstackrealign -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib/gfortran/modules }" ; export FFLAGS ; 
  FCFLAGS="${FCFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m32 -march=i686 -mtune=generic -msse2 -mfpmath=sse -mstackrealign -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib/gfortran/modules }" ; export FCFLAGS ; 
  VALAFLAGS="${VALAFLAGS:--g}" ; export VALAFLAGS ; 
  RUSTFLAGS="${RUSTFLAGS:--Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Clink-arg=-Wl,-z,relro -Clink-arg=-Wl,-z,now --cap-lints=warn}" ; export RUSTFLAGS ; 
  LDFLAGS="${LDFLAGS:--Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1  }" ; export LDFLAGS ; 
  LT_SYS_LIBRARY_PATH="${LT_SYS_LIBRARY_PATH:-/usr/lib:}" ; export LT_SYS_LIBRARY_PATH ; 
  CC="${CC:-gcc}" ; export CC ; 
  CXX="${CXX:-g++}" ; export CXX
<mock-chroot> sh-5.2$ eval $(rpm --eval '%set_build_flags')
<mock-chroot> sh-5.2$ python3 setup.py build
...
building 'Cython.Compiler.Parsing' extension
gcc -fno-strict-overflow -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -fcf-protection -fexceptions -fcf-protection -fexceptions -fcf-protection -fexceptions -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m32 -march=i686 -mtune=generic -msse2 -mfpmath=sse -mstackrealign -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -I/usr/include/python3.12 -c /builddir/cython/Cython/Compiler/Parsing.c -o build/temp.linux-i686-cpython-312/builddir/cython/Cython/Compiler/Parsing.o
...

Notes:

da-woods commented 1 year ago

So far:

da-woods commented 1 year ago

The offending flags that breaks it are -O2 -msse2; without those it runs fine (and with either of those on their own it runs fine)

I'm pretty convinced there isn't a blocking Cython bug here. It's possible there's a subtle issue with the test but I'm not sure that I'm going to get to the bottom of it right now. But I'll leave this open in case anyone else can.

(The other thing to add - those 2 flags are fine on my 64 bit Linux. They also seem fine on a 32bit Opensuse virtualbox)

scoder commented 1 year ago

Thanks for investigating this. I'll downgrade the priority to "any time in 3.0.x" and will see if I can just disable the test for 3.0.4, so that it doesn't crash any more.