Open jnpkrn opened 7 years ago
Build image provisioning date and time
Thu Feb 5 15:09:33 UTC 2015
Operating System Details
Distributor ID: Ubuntu
Description: Ubuntu 12.04.5 LTS
Release: 12.04
Codename: precise
Linux Version
3.13.0-29-generic
Cookbooks Version
a68419e https://github.com/travis-ci/travis-cookbooks/tree/a68419e
Verified this is indeed intermittent, the above link now points to a restarted run, which passed (note that the offset of the respective line is +3
, unfortunately I haven't grabbed these 3 extraneous lines when it was possible, which might have shed more light into this, supposing they were related error messages).
One of the possibilities that are hard to rule out is that parallel matrix builds (e.g., multiple compilers) share the same /dev/shm path (containers set up like that?) and it doesn't play very well in some rare circumstances as similar pseudorandom paths are being accessed...
Very odd. I'm not going to worry about it short-term, though it would be useful to know how the test systems are set up. Can we reproduce it with clang ourselves?
No cycles to spend on trying to reproduce that though we are now aware about this inclination in Travis CI so we'll have at least some clues when/if this recidivate.
Some archeology:
Workaround for Travis issue with POSIX semaphores
. currently, there is something similar as the "workaround" for Python environment https://github.com/travis-ci/travis-cookbooks/blob/64ff883360f3d265b87c072a07f78e9ef0a874fb/cookbooks/travis_python/recipes/devshm.rb#L24 (refers to the cookbook used in the affected run) . ...which dates back to this commit https://github.com/travis-ci/travis-cookbooks/commit/06be5a5139ae9f39c7e5831b6bad9a38d8bd5844#diff-c06d560f7d314b365d34ead2be8824daR23 which may correlate, in timing, with this magic issue 155 at hand
One more relevant hit: http://lists.corosync.org/pipermail/discuss/2013-May/002573.html
One quick thing to check is the location of your shared memory
I use travis ci for libqb and travis uses ubuntu vm's and I
know I had to do a workaround for the shared memory location
being moved from /dev/shm to /run/shm.
See: https://github.com/asalkeld/libqb/blob/master/.travis.yml
I'd suggest have a look at the output of:
mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
df -h | grep shm
tmpfs 3.9G 2.9M 3.9G 1% /dev/shm
and see if you need to run that workaround. (libqb tries /dev/shm
first).
Regarding the relevance to Python implied with the cookbook references
above, http://stackoverflow.com/a/30175343 seems to suggest it was to
solve some kind of issue with multiprocessing
module in Python's
standard library.
(see also #238)
Diagnostic enhancement from #238 shed some more light here:
../../tests/check_ipc.c:1506:F:ipc_max_dgram_size:test_ipc_max_dgram_size:0: Assertion 'init==try' failed: init==0x50e00, try==0x67f00, i=28, errno=90
where errno of 90 means EMSGSIZE
(Message too long).
One of the possibities is that some assumption that used to hold so far (per the previous successful test runs) is actually unreliable in practice and some factors of Travis environment just make it easier to prove it.
Another hit:
init==0x50e00, try==0x67f00, i=40, errno=90
From the diagnostics added so far, it seems that /dev/shm
mounted
as tmpfs
is quite small, just 64 MB, if it could be a culprit.
... PR #242 might help regarding this hypothesis.
Just got a report with occurrence of this issue on virtualized s390x:
ipc_max_dgram_size:test_ipc_max_dgram_size:0: Assertion 'init==try' failed: init==331264, try==331776
Mere 495M was allocated to /dev/shm.
It's testing socket buffers rather than SHM arenas so it might be a ulimit issue. Odd that it failed there though because that's comparing the reported maximum with the actual allocated!
https://travis-ci.org/ClusterLabs/libqb/jobs/178242766#L2722
...triggered intermittently only with clang (3.4), upon unrelated change.