Builds often fail. Succeed on Rebuild. No other changes.

robwilkerson commented 8 years ago

Description of your issue:

I've been noticing that I get a lot of build failures recently. Certainly a few of those are legitimate, but often simply hitting the Rebuild button will result in the build succeeding. Sometimes I have to rebuild twice, but the point is that no changes have been made to the app, the build or the tests.

One example is https://app.shippable.com/runs/574254b3d388860c00d68673. In this case, I had to rebuild twice, but again, no changes were made that should've impacted the build status. This leads me to think it might be in the environment that gets spun up.

Could this be due to some nuanced issues with my shippable.yml file? Is there anything I can do to improve the stability/reliability of the spin-up process? When builds report a failure, I drop everything to fix them, but the number of false alarms is becoming a bit frustrating. I have to believe the problem is likely on my end, but I have no idea how to improve the situation.

build_environment: Ubuntu 14.04
language: node_js
node_js:
  - "5.9.0"

services:
  - mysql

env:
  - NODE_ENV=test SLACK_HOOK_URL='<MYSLACKURL>' SLACK_CHANNEL='#shippableci'

branches:
  except:
    - gh-pages

build:
    ci:
    # Put config templates where they belong
    - cp ansible/roles/database/templates/knexplus.js.j2 www/config/knexplus.js
    - cp ansible/roles/database/templates/knexfile.js.j2 www/config/knexfile.js
    - cp ansible/roles/codebase/templates/vitalsource.js.j2 www/config/vitalsource.js

    # Provide values for the shippable environment
    - sed -i -e "s/{{ mysql_host }}/127.0.0.1/" www/config/knexplus.js
    - sed -i -e "s/{{ mysql_port }}/3306/" www/config/knexplus.js
    - sed -i -e "s/{{ mysql_root_password }}//" www/config/knexplus.js
    - sed -i -e "s/{{ mysql_db }}/mandrel/" www/config/knexplus.js

    - sed -i -e "s/{{ mysql_host }}/127.0.0.1/" www/config/knexfile.js
    - sed -i -e "s/{{ mysql_port }}/3306/" www/config/knexfile.js
    - sed -i -e "s/{{ mysql_user }}/mandrel/" www/config/knexfile.js
    - sed -i -e "s/{{ mysql_password }}//" www/config/knexfile.js
    - sed -i -e "s/{{ mysql_db }}/mandrel/" www/config/knexfile.js

    - sed -i -e "s/{{ vitalsource_api_key }}/DUMMYVALUE/" www/config/vitalsource.js

    # Install the app/test dependencies
    - npm install -g knex eslint
    - (cd www && npm install)

    # Prepare the test database
    - mysql -e 'CREATE DATABASE mandrel_test CHARACTER SET=utf8 COLLATE=utf8_unicode_ci;'
    - mysql -e "GRANT SELECT,INSERT,UPDATE,DELETE ON mandrel_test.* TO mandrel@localhost IDENTIFIED BY ''; FLUSH PRIVILEGES;"
    - (cd www && cat ./config/knexplus.js)
    - (cd www && NODE_ENV=test knex --knexfile ./config/knexplus.js migrate:latest)
    - (cd www && NODE_ENV=test knex --knexfile ./config/knexfile.js seed:run)

    # Run the tests
    - mkdir -p shippable/testresults
    - (cd www && ./node_modules/.bin/mocha --harmony_destructuring --reporter xunit > ../shippable/testresults/mocha.xml)

    # Visualize code coverage
    - mkdir -p shippable/codecoverage
    - (cd www && npm run cicoverage)

    on_success:
      - (cd www && node node_modules/slack-shippable/index.js -s)

    on_failure:
      - (cd www && node node_modules/slack-shippable/index.js)

notifications:
  email:
    recipients:
        - me@mine.com
    on_success: change
    on_failure: always

a-murphy commented 8 years ago

Well, all of your recent failures were the first to run on a particular build node, but that is quite likely just a coincidence (and was not the case on Friday). It does mean, however, that the failures have nothing to do with anything that might have been left running from a previous build. And your shippable.yml doesn't show anything running outside the container that could carry over to the next build.

It looks like you are getting two errors. One setting up your database that looks similar to this: http://stackoverflow.com/questions/16594672/1452-cannot-add-or-update-a-child-row-a-foreign-key-constraint-fails and one of your tests timing out. Hopefully the Stack Overflow link will help with the database. For the test time out, does that test typically take most of the time allowed? Or is it possible that it's trying to contact something that is still starting up when the test starts?

robwilkerson commented 8 years ago

Would you mind taking a look at a build that ran last night? This is one of those where I don't see any reason for the failure, but once I hit rebuild, everything was fine. When I look at the output of the failed build, I don't see anything that indicates a problem. Is it there and I'm not recognizing it for what it is? It'd be great if I'm just not reading the output correctly.

Thanks for your time.

a-murphy commented 8 years ago

The logs just show a test failure. I couldn't find anything about why it failed in the logs. It would probably be worth checking anything asynchronous in that test (or the set-up of the tests) to see if it could be two operations that ended in an unexpected order.

robwilkerson commented 8 years ago

Hmmm. Okay. Tests run fine locally and on my dev server so I guess I just can't figure out what it is about the CI environment that's so dramatically different. Thanks for having a look.

Shippable / support

Builds often fail. Succeed on Rebuild. No other changes. #2641

Description of your issue: