brandon-rhodes / assay

Attempt to write a Python testing framework I can actually stand
25 stars 7 forks source link

Sporadic pickling errors #11

Open bnavigator opened 4 years ago

bnavigator commented 4 years ago

I get sporadic pickling errors on the openSUSE package for skyfield. Is there any way to stabilize this?

[   36s] + /usr/bin/python3 -m assay --batch skyfield.tests
[   42s] ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................Traceback (most recent call last):
[   42s]   File "/usr/lib64/python3.8/runpy.py", line 194, in _run_module_as_main
[   42s]     return _run_code(code, main_globals, None,
[   42s]   File "/usr/lib64/python3.8/runpy.py", line 87, in _run_code
[   42s]     exec(code, run_globals)
[   42s]   File "/home/abuild/rpmbuild/BUILD/assay-23c18c2457c035996057144e1fe74cd6e19b44eb/assay/__main__.py", line 2, in <module>
[   42s]     main()
[   42s]   File "/home/abuild/rpmbuild/BUILD/assay-23c18c2457c035996057144e1fe74cd6e19b44eb/assay/command.py", line 27, in main
[   42s]     monitor.main_loop(args.name, args.batch or not isatty)
[   42s]   File "/home/abuild/rpmbuild/BUILD/assay-23c18c2457c035996057144e1fe74cd6e19b44eb/assay/monitor.py", line 68, in main_loop
[   42s]     runner.send(source)
[   42s]   File "/home/abuild/rpmbuild/BUILD/assay-23c18c2457c035996057144e1fe74cd6e19b44eb/assay/monitor.py", line 144, in runner_coroutine
[   42s]     give_work_to(worker)
[   42s]   File "/home/abuild/rpmbuild/BUILD/assay-23c18c2457c035996057144e1fe74cd6e19b44eb/assay/monitor.py", line 129, in give_work_to
[   42s]     paths = [path for name, path in worker.call(list_module_paths)]
[   42s]   File "/home/abuild/rpmbuild/BUILD/assay-23c18c2457c035996057144e1fe74cd6e19b44eb/assay/worker.py", line 77, in call
[   42s]     return pickle.load(self.from_worker)
[   42s] _pickle.UnpicklingError: pickle data was truncated
[   42s] error: Bad exit status from /var/tmp/rpm-tmp.uZ8Wea (%check)

Build history for aarch64

"failed" is above error, "unchanged" and "succeeded" is passing the unit tests. Happens on other architectures too.

brandon-rhodes commented 4 years ago

I have never seen that exception before, either on my own machines nor in CI. I clicked through to the build history but I only see failures because of unresolvable dependencies. Could you be more specific about where you are seeing this? Thanks!

bnavigator commented 4 years ago

This is how the linked job history looks like for me:

Screenshot_20200816_214518

bnavigator commented 4 years ago

It doesn't look as bad on x86_64 but happens there too.

I also encountered those sporadic errors locally. A rerun usually worked. So I suspect some kind of race condition / async call error.

brandon-rhodes commented 4 years ago

Yes, that’s how the dashboard looks for me as well. I clicked into the first few builds that are marked "failed" but see no failures with logs or exceptions?

bnavigator commented 4 years ago

Although the link goes to the failing source revision, the build results are from the most recent revision. Only the logs for the last (in this case successful) build are retained.

Ignore the unresolvables. Those are outdated distributions still configured for the devel project.

brandon-rhodes commented 4 years ago

Although the link goes to the failing source revision, the build results are from the most recent revision. Only the logs for the last (in this case successful) build are retained.

Oh. Well, that would explain why it isn't there. I thought I was supposed to click on the link and go read the full logs.

My guess is that one of the child threads dies. I'll try creating a test for that, and add a pretty error message marking the test involved as a failure and reporting that the child died without explanation. I wonder if your CI builds are under a resource constraint that occasionally causes a child process to not be able to, say, memory map one more copy of the main ephemeris.

Ignore the unresolvables. Those are outdated distributions still configured for the devel project.

Good, I was indeed ignoring them! Thanks for clarifying.

bnavigator commented 4 years ago

Here is a full log of a current fail:

_log.txt