Closed clippermadness closed 6 years ago
@clippermadness we're investigating this.
@manishas Any update on this?
@clippermadness no update yet. it seems like something is happening in our script handler that is causing it to exit without marking successful, even though all steps have succeeded. How often does this happen? is it pretty regular?
next step might be for us to try to analyze the node right after this occurs, so maybe you could update here the next time you see this happen?
@trriplejay This happens every time in this project if I remove the pre_ci_boot section of shippable.yml. It is not intermittent.
@manishas @trriplejay ping :)
We haven't been able to find the root cause yet, but we did release v6.4.4. You could try changing your runtime version to this and see if it avoids the error.
The other workaround for now could be to just change your runtime version back to 6.1.4, since you know that version works. Then you at least wouldn't have to wait to pull the build image.
I notice that you're using ruby version 2.4.1. This version used to be available directly in our older images, but our 6.1.4 through 6.4.4 images have 2.4.3
instead, so if you were to specify this in your yml, you could avoid the ruby download/install that is happening now in your build. Perhaps this change would also avoid the strange failures. If you specifically need 2.4.1, then you could consider changing your runtime version to 5.8.2, which has ruby 2.4.1 pre-installed.
let me know if any of these suggestions work for you. We haven't been able to reproduce the error ourselves yet, so it's hard to say when we'll have more information. Thanks for your patience!
Ok cool - I'll give those ideas a shot and see if they work.
Ok I followed these steps and it's still failing. The only difference that I can find between the the logs of a build that succeeds and one that fails is as follows:
Successful with pre_ci_boot: https://app.shippable.com/bitbucket/thetalake/portal/runs/1162/1/console
Booting up CEXEC Running CEXEC script sudo docker rm -fv $CONTAINER_NAME c.exec.portal.1160.1
Failure without: https://app.shippable.com/bitbucket/thetalake/portal/runs/1160/1/console
Booting up CEXEC Running CEXEC script ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/5c0461c2-d85c-486e-987a-3f9e129b2bd4.sh' Exception Invalid or no script tags received ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/5c0461c2-d85c-486e-987a-3f9e129b2bd4.sh' Exception Invalid or no script tags received sudo docker rm -fv $CONTAINER_NAME c.exec.portal.1162.1
Have you tried setting your runtime version back to 6.1.4? since that image version seems to work that might be the best way to go to avoid pulling.
That error you mention is definitely related. Normally our script handler sets a flag once all commands have completed successfully to indicate the overall success of the job. That's the "script tag" that the error is referring to. For some reason, the tag isn't being set in this case, even though everything is working exactly as normal. I'm still unable to reproduce, but am continuing to investigate.
A couple more notes of this.
Switching back to 6.1.4 definitely works.
I also tried changing the underlying node in our subscription. We have been using a 14.04 node, but I changed that to 16.04. That didn't work with either 6.4.4 or 6.3.4: same error.
So at this point, this project builds using the 16.04 6.4.4 node with a pre_ci_boot section in shippable.yml that pulls the older 6.1.4 image.
If I get rid of the pre_ci_boot section and attempt to build on the 6.4.4 image, the build always fails with the above description.
While upgrading to runtime 6.5.4 we noticed the same issue. https://app.shippable.com/github/thestorefront/tsf-api/runs/5331/1/console The Console tab doesn't show any problem while downloaded logs print:
Booting up CEXEC
Running CEXEC script
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/f056c1bb-3dc4-4a47-8e97-63bf3415f385.sh'
Exception Invalid or no script tags received
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/f056c1bb-3dc4-4a47-8e97-63bf3415f385.sh'
Exception Invalid or no script tags received
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/c2349b56-a431-4fe6-9cbf-53f367adc770.sh'
Exception Invalid or no script tags received
ERROR:script_runner - script_runner:Command failed : ssh-agent bash -c 'ssh-add /tmp/ssh/00_sub;ssh-add /tmp/ssh/01_deploy; cd /root && /root/c2349b56-a431-4fe6-9cbf-53f367adc770.sh'
Exception Invalid or no script tags received
Also bumping to this version I had to add apt-get install libcurl4-openssl-dev
in order to get libcurl
.
Another thing I noticed is when rebuilding failed runs we get an empty git_sync
step and no build_ci
.
https://app.shippable.com/github/thestorefront/tsf-api/runs/5335/1/console
~/src/github.com/thestorefront/tsf-api ~
fatal: Not a git repository (or any of the parent directories): .git
We have cache enabled and resetting it properly run every steps.
We have issue with the cache too. But nothing appears to be successful. The build_ci is even not executed, the process fail at "git_sync" step with message "this is not a git repo".
We’re working on fixing this. @rageshkrishna @ric03uec can look into the issue reported for git sync and build_ci not being executed...
Ping @ric03uec
@clippermadness @Bit-Doctor this has been fixed and will be available in the next release sometime early next week.
This error is happening because of an underlying bug in rvm(https://github.com/rvm/rvm/issues/4416) that was closed recently. The bug was resetting the bash TRAP
s in a few of the shippable scripts that are essential for their successful execution. without the TRAP
functions, the cleanup functions were not getting called which resulted in failed builds without any actual errors.
We still haven't been able to test the rvm fix successfully(https://github.com/rvm/rvm/issues/4416#issuecomment-408830405) so we've added some custom logic to get around this issue which should fix the builds that're failing for you.
I'll keep this issue open till we do the release and you can verify everything is good at your end.
Fix verified using Shippable base image 6.7.4, ruby 2.4.1 and rails 5.2.0. Build times now 6m faster without having to pull the old image. Thanks!
this is now fixed, closing
https://app.shippable.com/bitbucket/thetalake/portal/runs/1099/1/console https://app.shippable.com/download/jobConsoles?jobId=5ae27395a74b0e0800aa7a07
This project was using a specific build image from Shippable and was succeeding: pre_ci_boot: image_name: drydock/u16ruball image_tag: v6.1.4
When I removed that section, in an effort to get my builds to run faster and use the default image on my node, which is v6.3.4, now it fails.
But every step of the build that I can see succeeded. What's happening?