Open treygilliland opened 8 months ago
Hey @treygilliland , is there a specific builder that gives you the most trouble? The builder ID is usually printed at the start of the build logs.
Does it happen more with some projects than others? In the past we saw this with larger compose files with many services and it would be interesting to know if that was still a factor.
@klutchell it is a compose file with 5-6 services, 2 of them are ~300 MB, 2 of them are ~2GB and one is ~16GB but normally our builds make good use of the cache so builds stay around 3-5 minutes.
I don't know if it is a specific builder but we are building for the NVIDIA Jetson Orin if that helps. The builder ID for this particular failure is 4b2d7d0 but it seems to fail on the GH action side and pass on the Balena dashboard.
It just happened again with 5 of our 6 build jobs, only 1 succeeded. This time it aborted very early on (while the cache images were still downloading). It may be related to running multiple builds at once for the same fleet but it is hard to tell. The build has the 'running...' status on the balena dashboard despite the build triggered by the retry finishing before it.
Hey @treygilliland , is this still happening with the latest release of the action?
Occasionally the deploy-to-balena-action will fail with the error message
Error: ECONNRESET: aborted
despite the build being marked as succeeded on the Balena dashboard. We run several build jobs and randomly 1 of them will flake with this error. Also occasionally, it seems one of these flakes will cause all of the other jobs to flake as well.Here are the logs:
Our fix for this has always been to just rerun the individual job which flaked which will eventually work most of the time but sometimes it doesn't which requires rerunning the action as a whole.