balena-io / balena-cli

The official balena CLI tool.
Apache License 2.0
453 stars 139 forks source link

balena cli push timeout #2065

Open splitice opened 4 years ago

splitice commented 4 years ago

Ive been using git push in my CI process for years now. Recently opted to upgrade to balena push as suggested.

Everything was going fine until we noticed more failures than normal.

It's not that the build fails, but the balena push command ends mid build and returns and error code.

The build itself continues on the remote cloud build server and completes successfully however

pdcastro commented 4 years ago

Thank you for reporting this issue. Would you be able to share some additional details to help us reproduce and debug it?

Also:

$ balena push --help
...
OPTIONS
  -d, --detached
      When pushing to the cloud, this option will cause the build to start, then
      return execution back to the shell, with the status and release ID (if
      applicable).  When pushing to a local mode device, this option will cause
      the command to not tail application logs when the build has completed.
splitice commented 4 years ago

Thanks @pdcastro I'm trying to work out what attribute of our build leads to this issue in the background.

Unfortunately with ~20 minute builds an issue that occurs on 1/7 and a busy team who need a working CI it's a project on the backburner.

The environment is Linux (Github Actions) and the balena-cli version is the latest stable at build time (https://github.com/HalleyAssist/push-to-balenacloud).

I'm working through test a hypothesis that it occurs when a certain number of bytes have been transmitted currently as it doesn't seem to occur during silent periods. I've reduced the output from our build scripts to verify this and will be running builds today & tomorrow.

splitice commented 4 years ago

@pdcastro I've managed to get 6 consecutive passing builds by piping the output of a large tar extract command to /dev/null thereby reducing the amount of output. The output is still quite long (we get Earlier logs truncated... in the txt files on your end) but it's half of what it was.

I'll need to do a few more builds to ensure that I'm not just rolling the dice correctly. There wouldn't by any chance be a max response body size limit on your end (or something similar)?

Headless mode isnt really suitable for us as capturing the output of balena push / git push is the only way for us to get the build output. It's otherwise truncated on your end if retrieved by txt file.

pdcastro commented 4 years ago

Thank you for sharing these results @splitice. I am not aware of a max response body size limit, other than the truncation you mentioned - which shouldn't cause the CLI to end mid build. If the amount of logs was really large, I wonder if the CLI process might be reaching some Node.js limit - but again I am not aware of what that limit would be, and capturing the CLI output in debug mode (DEBUG=1 env var or --debug flag) might give us a clue.

Also, a couple of suggestions:

Headless mode isnt really suitable for us as capturing the output of balena push / git push is the only way for us to get the build output.

What about redirecting the output of your build commands / script to a text file saved on the image itself? A before/after example:

Dockerfile before:

...
RUN build-script.sh

Dockerfile after:

...
SHELL ["/bin/bash", "-c"]
RUN build-script.sh &> /tmp/image-build-output.txt

(I've selected bash as the shell so I could use &> redirection.)

Or also using the tee command to have both live output and saving to a file:

...
SHELL ["/bin/bash", "-c"]
RUN build-script.sh |& tee /tmp/image-build-output.txt

Then you might be able to use headless mode without losing the full logs.

The environment is Linux (Github Actions) and the balena-cli version is the latest stable at build time

Interesting! Hopefully GitHub isn't introducing additional issues - like killing the CLI process / container because of their own resource usage limits. By the way, using the latest CLI build is probably a good thing generally, but perhaps not as good while trying to isolate an issue, as the CLI version may have changed compared to "previous observations" of the issue.

splitice commented 3 years ago

As we used Github both before and after with git push and balena-cli I wouldnt expect them to be at fault here. I should however test that when I have time.

As a test I would suggest using the extraction of a large (many file) tar archive. As that's what I directed to /dev/null to largely solve the issue.