gearman / gearmand

http://gearman.org/
Other
741 stars 138 forks source link

FAIL t/hostile_gearmand (exit status: 134) if CXXFLAGS contains '-Wp,-D_GLIBCXX_ASSERTIONS' #272

Closed cheese closed 4 years ago

cheese commented 4 years ago

Source code is checked out at a40856d.

make test is run under %check section during rpmbuild. t/hostile_gearmand fails and make test will hang at the end. test-suite.log can be read by docker exec XXX cat /home/build/rpmbuild/BUILD/gearmand-1.1.19/test-suite.log with XXX replaced with the working container of docker build.

A reproducing Dockerfile:

FROM fedora:31

MAINTAINER Robin Lee <cheeselee@fedoraproject.org>

# Install packages
RUN dnf makecache
RUN dnf update -y
RUN dnf install -y 'dnf-command(builddep)' rpm-build git
RUN dnf install -y libtool autoconf automake
RUN dnf install -y make gettext-devel
RUN dnf install -y python3-sphinx

RUN useradd build
RUN su - build sh -c 'curl -O https://cheeselee.fedorapeople.org/gearmand-1.1.19-1.fc33.src.rpm'

# install builddeps
RUN dnf builddep -y /home/build/gearmand-1.1.19-1.fc33.src.rpm

RUN su - build sh -c 'rpmbuild --rebuild gearmand-1.1.19-1.fc33.src.rpm'

test-suite.log:

=======================================
   gearmand 1.1.19: ./test-suite.log
=======================================

# TOTAL: 36
# PASS:  29
# SKIP:  4
# XFAIL: 2
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

XFAIL: bin/gearman
==================

/home/build/rpmbuild/BUILD/gearmand-1.1.19/bin/.libs/lt-gearman Error in usage(No Functions were provided).

Client mode: /home/build/rpmbuild/BUILD/gearmand-1.1.19/bin/.libs/lt-gearman [options] [<data>]
Worker mode: /home/build/rpmbuild/BUILD/gearmand-1.1.19/bin/.libs/lt-gearman -w [options] [<command> [<args> ...]]

Common options to both client and worker modes.
        -f <function> - Function name to use for jobs (can give many)
        -h <host>     - Job server host
        -H            - Print this help menu
        -v            - Print diagnostic information to stdout(false)
        -p <port>     - Job server port
        -t <timeout>  - Timeout in milliseconds
        -i <pidfile>  - Create a pidfile for the process
        -S            - Enable SSL connections

Client options:
        -b            - Run jobs in the background(false)
        -I            - Run jobs as high priority
        -L            - Run jobs as low priority
        -n            - Run one job per line(false)
        -N            - Same as -n, but strip off the newline(false)
        -P            - Prefix all output lines with functions names
        -s            - Send job without reading from standard input
        -u <unique>   - Unique key to use for job

Worker options:
        -c <count>    - Number of jobs for worker to run before exiting
        -n            - Send data packet for each line(false)
        -N            - Same as -n, but strip off the newline(false)
        -w            - Run in worker mode(false)
XFAIL bin/gearman (exit status: 1)

XFAIL: bin/gearadmin
====================

No option execution operation given.

Options:
  --help                         Options related to the program.
  -h [ --host ] arg (=localhost) Connect to the host
  -p [ --port ] arg (=4730)      Port number or service to use for connection
  --server-version               Fetch the version number for the server.
  --server-verbose               Fetch the verbose setting for the server.
  --create-function arg          Create the function from the server.
  --cancel-job arg               Remove a given job from the server's queue
  --drop-function arg            Drop the function from the server.
  --show-unique-jobs             Show unique jobs on server.
  --show-jobs                    Show all jobs on the server.
  --getpid                       Get Process ID for the server.
  --status                       Status for the server.
  --priority-status              Queued jobs status by priority.
  --workers                      Workers for the server.
  -S [ --ssl ]                   Enable SSL connections.

XFAIL bin/gearadmin (exit status: 1)

SKIP: t/skip
============

SKIP t/skip (exit status: 77)

FAIL: t/hostile_gearmand
========================

FAIL t/hostile_gearmand (exit status: 134)

SKIP: t/drizzle
===============

SKIP t/drizzle (exit status: 77)

SKIP: t/postgres
================

SKIP t/postgres (exit status: 77)

SKIP: t/mysql
=============

SKIP t/mysql (exit status: 77)
p-alik commented 4 years ago

No issue on Ubuntu 18.04. My attempt to setup vagrant box based on fedora/31-cloud-base doesn't succeed.

[vagrant@localhost gearmand]$ dnf -v makecache
Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync
DNF version: 4.2.9
cachedir: /var/tmp/dnf-vagrant-qcipcw88
Unknown configuration value: failovermethod=priority in /etc/yum.repos.d/fedora-updates-modular.repo; Configuration: OptionBinding with id "failovermethod" does not exist
Unknown configuration value: failovermethod=priority in /etc/yum.repos.d/fedora-updates-modular.repo; Configuration: OptionBinding with id "failovermethod" does not exist
Unknown configuration value: failovermethod=priority in /etc/yum.repos.d/fedora-updates-modular.repo; Configuration: OptionBinding with id "failovermethod" does not exist
Making cache files for all metadata files.
fedora-modular: has expired and will be refreshed.
updates-modular: has expired and will be refreshed.
updates: has expired and will be refreshed.
fedora: has expired and will be refreshed.
Fedora Modular 31 - x86_64                                                                                                                                                         167 kB/s |  23 kB     00:00    
reviving: 'fedora-modular' can be revived - metalink checksums match.
fedora-modular: using metadata from Wed Oct 23 22:53:13 2019.
Fedora Modular 31 - x86_64 - Updates                                                                                                                                                42 kB/s |  20 kB     00:00    
reviving: 'updates-modular' can be revived - metalink checksums match.
updates-modular: using metadata from Mon Feb 10 02:00:45 2020.
Fedora 31 - x86_64 - Updates                                                                                                                                                        32 kB/s |  18 kB     00:00    
reviving: 'updates' can be revived - metalink checksums match.
updates: using metadata from Fri Feb 14 01:09:42 2020.
Fedora 31 - x86_64                                                                                                                                                                  31 kB/s |  23 kB     00:00    
reviving: 'fedora' can be revived - metalink checksums match.
Killed

Any idea how we could setup virtual environment to reproduce the issue?

cheese commented 4 years ago

Also no issue on Fedora if run the build and test directly.

Try to skip makecache and continue with next commands?

p-alik commented 4 years ago

Unfortunately all attempts to execute dnf were killed.

cheese commented 4 years ago

Too little memory?

esabol commented 4 years ago

The 1.1.19 release tarball is bogus anyway. I'd recommend waiting until that is straightened out.

EDIT: Oh, I see you checked out the code at a40856d instead of using the release tarball. Cool. Carry on... :)

esabol commented 4 years ago

I built your Dockerfile, and I think it's still using the release tarball. I can tell from how it compiles the code. I was able to confirm that one of the tests aborted and dropped core. This was presumably t/hostile_gearmand which FAILed for me as well. "make test" then hangs, presumably because it doesn't know how to deal with a test aborting without reporting success or failure.

The following Dockerfile passes all tests:

FROM fedora:31

MAINTAINER gearmand

# Install packages
RUN dnf makecache
RUN dnf update -y
RUN dnf install -y 'dnf-command(builddep)' rpm-build git
RUN dnf install -y libtool autoconf automake
RUN dnf install -y make gettext-devel
RUN dnf install -y python3-sphinx

RUN useradd build
RUN su - build sh -c 'curl -O https://cheeselee.fedorapeople.org/gearmand-1.1.19-1.fc33.src.rpm'

# install builddeps
RUN dnf builddep -y /home/build/gearmand-1.1.19-1.fc33.src.rpm

ARG GEARMAN_REPO=https://github.com/gearman/gearmand

RUN cd /tmp && git clone --depth 1 --branch master ${GEARMAN_REPO}.git
WORKDIR /tmp/gearmand
RUN ./bootstrap.sh -a
RUN ./configure --enable-ssl 2>&1 | tee ./configure.log
RUN make 2>&1 | tee ./build.log
RUN make test 2>&1 | tee ./test.log
cheese commented 4 years ago

I further figured out hosttile_gearmand crashed if CFLAGS contains '-Wp,-D_GLIBCXX_ASSERTIONS' The following Dockerfile can reproduce the issue, just add ENV CXXFLAGS='-Wp,-D_GLIBCXX_ASSERTIONS' before configuring.

FROM fedora:31

MAINTAINER gearmand

ARG version=31
ARG GEARMAN_REPO=https://github.com/gearman/gearmand

LABEL version="${version}" description="Gearman SSL Job Server Image"

# Configure environment
ENV HOME=/root

# Install packages
RUN dnf makecache
RUN dnf update -y
RUN dnf install -y mock
RUN useradd build && usermod -a -G mock build
RUN dnf install -y 'dnf-command(builddep)' rpm-build git
RUN dnf builddep -y gearmand

# Retrieve the source code and bootstrap
RUN cd /tmp && git clone --depth 1 --branch master ${GEARMAN_REPO}.git
RUN dnf install -y libtool autoconf automake
RUN dnf install -y make gettext-devel
RUN dnf install -y python3-sphinx
WORKDIR /tmp/gearmand

RUN ./bootstrap.sh -a

ENV CXXFLAGS='-Wp,-D_GLIBCXX_ASSERTIONS'
RUN ./configure --enable-ssl 2>&1 | tee ./configure.log
RUN make 2>&1 | tee ./build.log
RUN make test 2>&1 | tee ./test.log
esabol commented 4 years ago

Interesting....

Can you run t/hostile_gearmand under gdb and get a backtrace? And maybe valgrind?

Anyone have a link to the documentation on this compilation flag? I can’t find anything useful on it with Google.

Should we be compiling with this flag in Travis CI?

esabol commented 4 years ago

I just found this: https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_macros.html But it’s still pretty sparse. It might be worth trying with _GLIBCXX_DEBUG_PEDANTIC?

cheese commented 4 years ago

I just figured out a patch for this issue.

After that, gearmand code seems using &vec[0] pattern much. Should they be just replaced with vec.data()?

esabol commented 4 years ago

After that, gearmand code seems using &vec[0] pattern much. Should they be just replaced with vec.data()?

Other than the two cases in your PR?

cheese commented 4 years ago

I did not go deep. I suspect that payload.size() ? &payload[0] : NULL may be just replaced with payload.data() but I found the code base never calls vec.data().

p-alik commented 4 years ago

Should we be compiling with this flag in Travis CI?

Sounds good to me.

esabol commented 4 years ago

Should we be compiling with this flag in Travis CI?

Sounds good to me.

If you merge PR #265, I will submit a PR to add it.

Do you think we should add it to all gcc builds or just some subset of the gcc builds?

esabol commented 4 years ago

I suspect that payload.size() ? &payload[0] : NULL may be just replaced with payload.data() but I found the code base never calls vec.data().

Oh, I get what you're asking now. vec.data() is new to C++11. The bulk of the gearmand code was written before that was adopted. When did g++ add support for it? We're still supporting gcc/g++ 4... If g++ 4 supports vec.data(), then using vec.data() should be fine. Otherwise, it's best to go with the more widely supported vec.size() ? &vec[0] : NULL.

cheese commented 4 years ago

I suspect that payload.size() ? &payload[0] : NULL may be just replaced with payload.data() but I found the code base never calls vec.data().

Oh, I get what you're asking now. vec.data() is new to C++11. The bulk of the gearmand code was written before that was adopted. When did g++ add support for it? We're still supporting gcc/g++ 4... If g++ 4 supports vec.data(), then using vec.data() should be fine. Otherwise, it's best to go with the more widely supported vec.size() ? &vec[0] : NULL.

GCC 4.x versions are different. I remember GCC 4.8.1 is the first one that has C+11 feature complete. So I think C++11 feature should be avoided if GCC versions less than 4.8.1 are still supported, even the vec.data() method may have been implemented in some earlier version.

p-alik commented 4 years ago

If you merge PR #265, I will submit a PR to add it.

Done.

Do you think we should add it to all gcc builds or just some subset of the gcc builds?

Since such change could lead to unforeseeable impact in CI build process, we should gain some experience with most little subset at the beginning. Other thoughts?

SpamapS commented 4 years ago

RHEL8 / CentOS 8 are out. This means RHEL6 is now the old stable, not the stable release of RHEL. So I'd like for us to stop supporting < 4.8.1

esabol commented 4 years ago

So I'd like for us to stop supporting < 4.8.1

The oldest version of gcc/g++ we have access to in Travis CI is 4.8.5, so I concur.

cheese commented 4 years ago

1.1.19 already does not build on EL6, with GCC 4.4, since nullptr is used.