cschwan / sage-on-gentoo

(Unofficial) Gentoo Overlay for Sage- and Sage-related ebuilds
84 stars 26 forks source link

Random failures in Prefix doctesting sage/structure/parent.pyx (8.8.rc0) #541

Closed strogdon closed 5 years ago

strogdon commented 5 years ago

The failure is

sage -t --long usr/lib64/python2.7/site-packages/sage/structure/parent.pyx
**********************************************************************
File "usr/lib64/python2.7/site-packages/sage/structure/parent.pyx", line 1734, in sage.structure.parent.Parent.hom.register_embedding
Failed example:
    K.coerce_embedding()(a)
Exception raised:
    Traceback (most recent call last):
      File "/storage/strogdon/gentoo-rap/usr/lib64/python2.7/site-packages/sage/doctest/forker.py", line 681, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/storage/strogdon/gentoo-rap/usr/lib64/python2.7/site-packages/sage/doctest/forker.py", line 1105, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.structure.parent.Parent.hom.register_embedding[23]>", line 1, in <module>
        K.coerce_embedding()(a)
      File "sage/structure/parent.pyx", line 1786, in sage.structure.parent.Parent.coerce_embedding (/storage/strogdon/gentoo-rap/var/tmp/portage/sci-mathematics/sage-9999/work/sage-9999/src-python2_7/build/cythonized/sage/structure/parent.c:14718)
        return copy(self._embedding) # It might be overkill to make a copy here
      File "/storage/strogdon/gentoo-rap/usr/lib64/python2.7/copy.py", line 80, in copy
        return copier(x)
      File "sage/categories/map.pyx", line 178, in sage.categories.map.Map.__copy__ (/storage/strogdon/gentoo-rap/var/tmp/portage/sci-mathematics/sage-9999/work/sage-9999/src-python2_7/build/cythonized/sage/categories/map.c:4010)
        out._parent = self.parent() # self._parent might be None
      File "sage/categories/map.pyx", line 231, in sage.categories.map.Map.parent (/storage/strogdon/gentoo-rap/var/tmp/portage/sci-mathematics/sage-9999/work/sage-9999/src-python2_7/build/cythonized/sage/categories/map.c:4224)
        return homset.Hom(D, C, self._category_for)
      File "/storage/strogdon/gentoo-rap/usr/lib64/python2.7/site-packages/sage/categories/homset.py", line 422, in Hom
        H = X._Hom_(Y, category)
...
    RuntimeError: maximum recursion depth exceeded while calling a Python object

These do not trigger the failure

sage -t --long ~/usr/lib64/python2.7/site-packages/sage/structure/parent.pyx

or

sage -t --long ~/usr/lib64/python2.7/site-packages/sage/structure/

But this will trigger the failure

sage -t --long ~/usr/lib64/python2.7/site-packages/sage/structure/ ~/usr/lib64/python2.7/site-packages/sage/interfaces/

The failure was discovered when running

sage -tp 12 --long --all
kiwifb commented 5 years ago

That's a curious one. Just on prefix?

strogdon commented 5 years ago

So far, only on prefix where I'm using gcc9. But no problem when doctesting 8.8.beta6. I missed beta7.

strogdon commented 5 years ago

There was a patchbot that had the failure https://patchbot.sagemath.org/log/27900/Darwin/Darwin%20Kernel%20Version%2018.5.0:%20Mon%20Mar%2011%2020:40:32%20PDT%202019;%20root:xnu-4903.251.3~3/RELEASE_X86_64/x86_64/18.5.0/steenrod/2019-06-01%2003:21:06

strogdon commented 5 years ago

From this https://trac.sagemath.org/ticket/27900 ticket

kiwifb commented 5 years ago

Interesting. The failure is on OS X, most likely with clang. I am not convinced it is due to that ticket, or possibly that ticket alone. But this is hard.

timokau commented 5 years ago

I just got the same error with the nix package when updating from rc0 to rc1 (linux, gcc). I can reproduce it reliably when testing the whole structure subdir, but not when only testing the parent.pyx file. Haven't seen in with rc0, but that's probably random.

kiwifb commented 5 years ago

I cannot see it at all. I feel like there must be another element to this that is not obvious. What version of ipykernel are you both using?

strogdon commented 5 years ago
[U] dev-python/ipykernel
     Available versions:  4.6.1-r2 (~)4.8.2 (~)5.1.0 **9999[1] {test PYTHON_TARGETS="python2_7 python3_5 python3_6 python3_7"}
     Installed versions:  4.8.2(01:55:47 AM 05/18/2019)(-test PYTHON_TARGETS="python2_7 python3_6 -python3_5")
     Homepage:            https://github.com/ipython/ipykernel
     Description:         IPython Kernel for Jupyter

[1] "science" /var/lib/layman/science

I'm thinking gcc? Maybe something needs to be rebuilt that hasn't been.

kiwifb commented 5 years ago

Difficult to say. I am now on gcc-9.1.0 too, but I haven't done a massive system rebuild. I am wondering about Timo.

timokau commented 5 years ago

Wow, I wasn't even aware that there is gcc 9 already. Nix is still working on switching to gcc 8 by default.

I'm using gcc 7.4.0 and ipykernel 4.10.0. A change of a package always automatically triggers a rebulid of all reverse-dependencies in nix, so that can't be the cause in my case.

I have build 8.8.rc0 (no error) and 8.8.rc1 (reliably fails) with the exact same dependencies, so there is no obvious cause there.

kiwifb commented 5 years ago

How many parallel threads Timo? Steve does 12, I usually do 8. I am wondering if there is some threshold in the number of parallel threads.

strogdon commented 5 years ago
ls ~/.sage/cache/
_storage_strogdon_gentoo-rap_usr_lib64_python2.7_site-packages-lazy_import_cache.pickle

Removing this file

rm ~/.sage/cache/*

fixes things here. But as soon as the file appears again I get the failure. Running any doctest seems to create the file.

kiwifb commented 5 years ago

Now that's an interesting clue. @timokau do you have something similar? And if you do what is the name of the file?

strogdon commented 5 years ago

Actually, just running sage creates the file.

timokau commented 5 years ago

That file (or the whole ~/.sage directory for that matter) didn't make a difference for me. Maybe worth noting that my sage tests always run in a build sandbox with a tmpdir home, so they always start from an empty ~/.sage.

I usually use 4 threads. I just tried it, and the failure doesn't occur with 1 thread. It does with 2 though.

timokau commented 5 years ago

Interestingly enough, when running the tests outside the sandbox I have the opposite experience from Steve. Failure does not occur with a pre-existing ~/.sage, but as soon as I remove it it occurs.

timokau commented 5 years ago

I can't narrow it down to any particular file in ~/.sage though. Probably the creation of those files just slows down the tests for just the right amount to create some race condition.

strogdon commented 5 years ago

I'm wondering if the order in which parent.pyx appears in doctesting affects whether it fails. I removed the cache file and doctested 8.8.rc1. Of course as soon as doctesting started the cache file was created but parent.pyx did not fail. However, after doctesting sage

sage -t --long ~/usr/lib64/python2.7/site-packages/sage/structure/ ~/usr/lib64/python2.7/site-packages/sage/interfaces/

resulting in parent.pyx failing. parent.pyx did not fail when switching the order in which the above sub-directories were doctested.

timokau commented 5 years ago

For me the failure occurs weather or not I test interfaces first (starting with an empty ~/sage).

timokau commented 5 years ago

The failure doesn't manifest itself anymore with rc2, although that is probably random.

strogdon commented 5 years ago

No parent.pyx failure here either when doctesting sage with rc2, but I can still force the failure by

sage -t --long ~/usr/lib64/python2.7/site-packages/sage/structure/ ~/usr/lib64/python2.7/site-packages/sage/interfaces/
timokau commented 5 years ago

Not the case for me, unfortunately. If there was a completely reliable way to reproduce we could at least check if the issue is present upstream.

timokau commented 5 years ago

Cause has apparently been identified: https://trac.sagemath.org/ticket/28036

strogdon commented 5 years ago

The commit https://git.sagemath.org/sage.git/commit?id=4d1cf508f9fc19f73e2ec3c82258400009e27dcf for ticket https://trac.sagemath.org/ticket/28036 which is now in rc3 seems to fix thing for me here.

timokau commented 5 years ago

I think this can be closed then.

strogdon commented 5 years ago

Should have been: now in rc3