delphix / linux-pkg

Framework to build custom packages for the Delphix Appliance
Apache License 2.0
4 stars 31 forks source link

Error: failed command 'bin/omnibus build td-agent3' #20

Open prakashsurya opened 5 years ago

prakashsurya commented 5 years ago

We hit a failure in the nightly build here

The error message shows this:

07:35:25 Progress: |    [NetFetcher: ncurses] I | 2019-02-27T15:35:12+00:00 | Retrying failed download due to Net::OpenTimeout (3 retries left)...
07:35:25 
07:36:22 Progress: |    [NetFetcher: ncurses] I | 2019-02-27T15:36:14+00:00 | Retrying failed download due to Net::OpenTimeout (2 retries left)...
07:36:22 
07:37:20 Progress: |    [NetFetcher: ncurses] I | 2019-02-27T15:37:14+00:00 | Retrying failed download due to Net::OpenTimeout (1 retries left)...
07:37:20 
07:38:17 Progress: |    [NetFetcher: ncurses] E | 2019-02-27T15:38:14+00:00 | Download failed - Net::OpenTimeout!
07:38:17 #<Thread:0x00005630f0757a70@/home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:57 run> terminated with exception (report_on_exception is true):
07:38:17 /usr/lib/ruby/2.5.0/net/http.rb:937:in `initialize': execution expired (Net::OpenTimeout)
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:937:in `open'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:937:in `block in connect'
07:38:17    from /usr/lib/ruby/2.5.0/timeout.rb:103:in `timeout'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:935:in `connect'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:920:in `do_start'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:909:in `start'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:337:in `open_http'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:755:in `buffer_open'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:226:in `block in open_loop'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `catch'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `open_loop'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:165:in `open_uri'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/core_extensions/open_uri.rb:51:in `open_uri'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:735:in `open'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:35:in `open'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/download_helpers.rb:80:in `download_file!'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:175:in `download'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:86:in `fetch'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/software.rb:888:in `fetch'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/project.rb:1066:in `block (3 levels) in download'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:64:in `block (4 levels) in initialize'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `loop'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `block (3 levels) in initialize'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `catch'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `block (2 levels) in initialize'
07:38:17 /usr/lib/ruby/2.5.0/net/http.rb:937:in `initialize': execution expired (Net::OpenTimeout)
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:937:in `open'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:937:in `block in connect'
07:38:17    from /usr/lib/ruby/2.5.0/timeout.rb:103:in `timeout'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:935:in `connect'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:920:in `do_start'
07:38:17    from /usr/lib/ruby/2.5.0/net/http.rb:909:in `start'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:337:in `open_http'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:755:in `buffer_open'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:226:in `block in open_loop'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `catch'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `open_loop'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:165:in `open_uri'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/core_extensions/open_uri.rb:51:in `open_uri'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:735:in `open'
07:38:17    from /usr/lib/ruby/2.5.0/open-uri.rb:35:in `open'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/download_helpers.rb:80:in `download_file!'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:175:in `download'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:86:in `fetch'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/software.rb:888:in `fetch'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/project.rb:1066:in `block (3 levels) in download'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:64:in `block (4 levels) in initialize'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `loop'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `block (3 levels) in initialize'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `catch'
07:38:17    from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `block (2 levels) in initialize'
07:38:17 Error: failed command 'bin/omnibus build td-agent3'
07:38:17 Error: failed command './buildpkg.sh td-agent'
07:38:17 Error: failed command './buildall.sh'
jgallag88 commented 5 years ago

The build of this package seems to be downloading artifacts from a number of different sites:

$ cat consoleText | grep 'Downloading from'
   [NetFetcher: jemalloc] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://github.com/jemalloc/jemalloc/releases/download/4.5.0/jemalloc-4.5.0.tar.bz2'
       [NetFetcher: zlib] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://zlib.net/fossils/zlib-1.2.11.tar.gz'
    [NetFetcher: cacerts] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://curl.haxx.se/ca/cacert-2018-12-05.pem'
     [NetFetcher: xproto] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://www.x.org/releases/individual/proto/xproto-7.0.25.tar.gz'
[NetFetcher: util-macros] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://www.x.org/releases/individual/util/util-macros-1.18.0.tar.gz'
[NetFetcher: pkg-config-lite] I | 2019-02-27T15:29:34+00:00 | Downloading from `http://downloads.sourceforge.net/project/pkgconfiglite/0.28-1/pkg-config-lite-0.28-1.tar.gz'
 [NetFetcher: makedepend] I | 2019-02-27T15:29:34+00:00 | Downloading from `https://www.x.org/releases/individual/util/makedepend-1.0.5.tar.gz'
    [NetFetcher: openssl] I | 2019-02-27T15:29:34+00:00 | Downloading from `https://www.openssl.org/source/openssl-1.0.2q.tar.gz'
    [NetFetcher: ncurses] I | 2019-02-27T15:29:34+00:00 | Downloading from `https://ftp.gnu.org/gnu/ncurses/ncurses-5.9.tar.gz'
    [NetFetcher: libedit] I | 2019-02-27T15:29:34+00:00 | Downloading from `http://www.thrysoee.dk/editline/libedit-20120601-3.0.tar.gz'
    [NetFetcher: libtool] I | 2019-02-27T15:29:35+00:00 | Downloading from `https://ftp.gnu.org/gnu/libtool/libtool-2.4.tar.gz'
     [NetFetcher: libffi] I | 2019-02-27T15:29:37+00:00 | Downloading from `ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz'
    [NetFetcher: libyaml] I | 2019-02-27T15:29:38+00:00 | Downloading from `http://pyyaml.org/download/libyaml/yaml-0.1.7.tar.gz'
   [NetFetcher: libiconv] I | 2019-02-27T15:29:38+00:00 | Downloading from `https://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.15.tar.gz'
       [NetFetcher: ruby] I | 2019-02-27T15:29:38+00:00 | Downloading from `https://cache.ruby-lang.org/pub/ruby/2.4/ruby-2.4.5.tar.gz'
    [NetFetcher: liblzma] I | 2019-02-27T15:29:39+00:00 | Downloading from `http://tukaani.org/xz/xz-5.2.3.tar.gz'
    [NetFetcher: libxml2] I | 2019-02-27T15:29:39+00:00 | Downloading from `ftp://xmlsoft.org/libxml2/libxml2-2.9.8.tar.gz'
    [NetFetcher: libxslt] I | 2019-02-27T15:29:41+00:00 | Downloading from `ftp://xmlsoft.org/libxml2/libxslt-1.1.30.tar.gz'
   [NetFetcher: rubygems] I | 2019-02-27T15:29:43+00:00 | Downloading from `http://production.cf.rubygems.org/rubygems/rubygems-2.6.14.tgz'
 [NetFetcher: postgresql] I | 2019-02-27T15:29:43+00:00 | Downloading from `https://ftp.postgresql.org/pub/source/v9.6.9/postgresql-9.6.9.tar.bz2'
$ cat consoleText | grep 'Fetching from'
==========[GitFetcher: config_guess] I | 2019-02-27T15:29:33+00:00 | Fetching from `https://github.com/chef/config-mirror.git'
    [GitFetcher: fluentd] I | 2019-02-27T15:29:45+00:00 | Fetching from `https://github.com/fluent/fluentd.git'
 [GitFetcher: splunk-hec] I | 2019-02-27T15:29:45+00:00 | Fetching from `https://github.com/delphix/fluent-plugin-splunk-hec.git'

We will need a way to mirror these if we want repeatable builds

prakashsurya commented 5 years ago

Looking at the build script here

It looks like these downloads are intentional; e.g.

    # Ensure all required gems are installed
    logmust bundle install --binstubs
    # Download dependent gems using downloader
    logmust bin/gem_downloader core_gems.rb
        logmust bin/gem_downloader delphix_plugin_gems.rb

lots of download statements in these two files: core_gems.rb, delphix_plugin_gems.rb

prakashsurya commented 5 years ago

The way we rebuild all packages from source, coupled with the fact that building each package can be inherently unreliable (e.g. due to dependencies like this), is concerning. This puts us back in the situation that any build failure of any of the projects included in the "linux-pkg" framework can cause problems building our appliance.

This was the reason we opted to consuming packages in the new appliance-build system, so any one project wouldn't cause problems for the appliance build as a whole, but I think we've regressed on this goal due to the linux-pkg build architecture.

pzakha commented 5 years ago

We will need a way to mirror these if we want repeatable builds

@jgallag88 I had raised this issue with @prashks during the original review, and he pointed out that every package is versioned so we should achieve repeatable builds. That doesn't mean that the builds are reliable though.

This was the reason we opted to consuming packages in the new appliance-build system, so any one project wouldn't cause problems for the appliance build as a whole, but I think we've regressed on this goal due to the linux-pkg build architecture.

@prakashsurya Since we are now moving towards a larger amount of packages, there is a trade-off to be made. The idea behind linux-pkg was to build packages that do not see much changes brought by the team, meaning that failure of linux-pkg affects a very small part of the team. We are seeing a lot of changes to the packages being managed by linux-pkg right now, but I predict that it would be greatly reduced eventually. Right now a failure of the linux-pkg build doesn't really impact the rest of the appliance-build (you can still test changes to the app-gate, zfs, masking). That said, I do see 2 issues with the way things currently are:

  1. We are making a lot of changes to delphix-platform right now and a breakage of the linux-pkg affects everyone working on that package.
  2. We are currently very vulnerable to changes in the kernel version, which can cause breakage of the whole product (very bad).

I have some ideas on how to reduce the impact of the first issue. As for the second issue, this is not really related to linux-pkg or how we build our packages; this issues should be fixed by the package mirror, although I have some ideas on how we could fix it even before we have the mirror.

prashks commented 5 years ago

The downloads done here are necessary to build this package and uses the most popular rubygems.org site. And looks like this network connection issue could likely be from our side (from https://www.isitdownrightnow.com/rubygems.org.html - it was only down more than a week ago). So we'll need to have checks on our infrastructure as well for reliability.

In any case, I agree that its prudent to have a local mirror for the dependencies here and i'll touch base with DevOps on that and point the build of this package to such a local mirror eventually.

pzakha commented 5 years ago

Looking at the output John pasted (https://github.com/delphix/linux-pkg/issues/20#issuecomment-467976389), it seems to copy from a bunch of sites, so setting up a mirror for this might prove problematic.

prashks commented 5 years ago

Ok, see your point @pzakha.
So, speaking for this particular package, we don't need it to be built every time - only times I can think of for now are :

prakashsurya commented 5 years ago

@prashks IIRC, that's true for all packages in this framework. Unfortunately, I don't think that'll work due to the framework of this linux-pkg repository, and how it interacts with appliance-build.

pzakha commented 5 years ago

So, thinking out loud, how about we either make the framework do that or provide a flag/tunable that each package can choose to skip building a package ?

It's definitely doable but that would introduce extra complexity in the build, and extra potential issues. For instance, we can modify the framework so that it pushes the artifacts for each package to its own directory and have linux-pkg fetch the latest artifacts for a given package from S3 if it fails to build that package.

So it's definitely possible, even with the current framework, but we would need to evaluate the pros and the cons of that approach.

prakashsurya commented 5 years ago

For instance, we can modify the framework so that it pushes the artifacts for each package to its own directory and have linux-pkg fetch the latest artifacts for a given package from S3 if it fails to build that package.

This sounds awfully similar to how we do it for non-linux-pkg packages.

prashks commented 5 years ago

For instance, we can modify the framework so that it pushes the artifacts for each package to its own directory and have linux-pkg fetch the latest artifacts for a given package from S3 if it fails to build that package.

Yeah not a bad idea for the framework to have a fallback for artifacts. Instead of pushing artifacts to a package's own dir, how about using our internal artifactory.delphix.com where other packages already live ?

So it's definitely possible, even with the current framework, but we would need to evaluate the pros and the cons of that approach.

Agree, definitely need to evaluate all the pros and cons, thanks.