cschwan / sage-on-gentoo

(Unofficial) Gentoo Overlay for Sage- and Sage-related ebuilds
79 stars 26 forks source link

sage/matrix/matrix_integer_dense.pyx doctest sometimes breaks with time out #707

Open kiwifb opened 2 years ago

kiwifb commented 2 years ago
sage -t --long --random-seed=4867623489143374956615441254140194808 /usr/lib/python3.10/site-packages/sage/matrix/matrix_integer_dense.pyx  # Timed out (and interrupt failed)

It doesn't always fail. But it related to using openblas with threads. Switching openblas to use openmp will make the issue go away. It is unclear if switching to another blas also fixes it. It needs to be tested.

strogdon commented 2 years ago

A data point. I do see the failure on s-o-g but not so far on vanilla. Vanilla here uses system openblas [ pthread, -openmp ] and system singular. The s-o-g failure

sage: a = matrix(ZZ,2,[1,-7,3,5]) ## line 5597 ##
sage: a._change_ring(RDF) ## line 5598 ##
[ 1.0 -7.0]
[ 3.0  5.0]
sage: sig_on_count() # check sig_on/off pairings (virtual doctest) ## line 5601 ##
0
sage: A = matrix(ZZ, 3, 3, [-8, 2, 0, 0, 1, -1, 2, 1, -95]) ## line 5621 ##
sage: As = singular(A); As ## line 5622 ##
strogdon commented 2 years ago

Another data point - s-o-g does not have openblas as a NEEDED lib.

On vanilla
$ objdump -p src/sage/matrix/matrix_integer_dense.cpython-310-x86_64-linux-gnu.so | grep NEEDED
  NEEDED               libiml.so.0
  NEEDED               libgmp.so.10
  NEEDED               libopenblas.so.0
  NEEDED               libpari-gmp-tls.so.7
  NEEDED               libflint.so.16
  NEEDED               libm.so.6
  NEEDED               libc.so.6

versus on Gentoo

$ objdump -p  /usr/lib/python3.10/site-packages/sage/matrix/matrix_integer_dense.cpython-310-x86_64-linux-gnu.so | grep NEEDED
  NEEDED               libiml.so.0
  NEEDED               libpari-gmp-tls.so.7
  NEEDED               libflint.so.16
  NEEDED               libgmp.so.10
  NEEDED               libm.so.6
  NEEDED               libc.so.6
strogdon commented 2 years ago

needed libs may not be an issue. On my gentoo-prefix I don't see a doctest failure.

kiwifb commented 2 years ago

It shouldn't be an issue. blas is not used directly, it should be pulled by iml.

strogdon commented 2 years ago

I'm able to get the time out (/storage/strogdon/gentoo-rap/usr/lib64/libopenblas.so.0(blas_thread_shutdown_+0xbf)[0x7ffb0a90889f]) on gentoo-prefix when doctesting the folder

sage -tp 9 --long ~/usr/lib/python3.10/site-packages/sage/matrix/

I have not been able to get vanilla to fail when doctesting the above folder.

strogdon commented 2 years ago

From src/bin/sage-env there is

# Multithreading in OpenBLAS does not seem to play well with Sage's attempts to
# spawn new processes, see #26118. Apparently, OpenBLAS sets the thread
# affinity and, e.g., parallel doctest jobs, remain on the same core.
# Disabling that thread-affinity with OPENBLAS_MAIN_FREE=1 leads to hangs in
# some computations.
# So we disable OpenBLAS' threading completely; we might loose some performance
# here but strangely the opposite seems to be the case. Note that callers such
# as LinBox use a single-threaded OpenBLAS anyway.
export OPENBLAS_NUM_THREADS=1

Does this mean that OPENBLAS_NUM_THREADS=1 during doctests? In any event I get non-failing results with

OPENBLAS_NUM_THREADS=1 sage -t --long /usr/lib/python3.10/site-packages/sage/matrix/matrix_integer_dense.pyx

I'm not sure what s-o-g does relative to OPENBLAS_NUM_THREADS.

kiwifb commented 2 years ago

I do nothing about it. If we were to add something, it may have to live in sage-runtest. But yes it means the whole of vanilla sage runs basically without threads unless something overrides it. It is a bit misguided to only consider linbox, scipy uses lapack for some stuff and so does iml which is where the issue come from.

kiwifb commented 2 years ago

Setting OPENBLAS_NUM_THREADS definitely has an impact here. I will think about what to do about it.