Closed AndrewGable closed 5 months ago
Looks related: https://github.com/actions/runner-images/issues/7754
GitHub confirmed this was something on their side and they are looking into it
Been pretty quiet from GitHub support, I will bump them
@AndrewGable Whoops! This issue is 2 days overdue. Let's get this updated quick!
No update from GitHub support
@AndrewGable Whoops! This issue is 2 days overdue. Let's get this updated quick!
@AndrewGable 6 days overdue. This is scarier than being forced to listen to Vogon poetry!
We got a work around from GitHub support, but I am not sure we are seeing the error anymore. Looking into it today.
GitHub says if we set --maxsockets=1
on npm install
it should help, but I am not sure we want to do so with all the runners not failing.
@AndrewGable Whoops! This issue is 2 days overdue. Let's get this updated quick!
@AndrewGable Huh... This is 4 days overdue. Who can take care of this?
@AndrewGable Now this issue is 8 days overdue. Are you sure this should be a Daily? Feel free to change it!
@AndrewGable Now this issue is 8 days overdue. Are you sure this should be a Daily? Feel free to change it!
@AndrewGable 10 days overdue. I'm getting more depressed than Marvin.
I'll look back into this
@AndrewGable Huh... This is 4 days overdue. Who can take care of this?
@AndrewGable 6 days overdue. This is scarier than being forced to listen to Vogon poetry!
@AndrewGable 10 days overdue. I'm getting more depressed than Marvin.
I think GitHub must have fixed it on their side, I haven't seen this happen in 2+ weeks.
Hi, not to be intrusive here, but it seems that this is a recurring issue here... have you considered giving FlyCI a try?
📣 @kgantchev! 📣 Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>
@kgantchev - Feel free to follow the proposal process, but no we haven't.
@AndrewGable thanks for sharing the proposal guide. I've created a proposal based on that guide.
Frequent GitHub failure at unsustainably high rates (close to 50%). This appears to be an infrastructure issue with a message indicating that the agent stopped responding:
The hosted runner encountered an error while running your job. (Error Type: Disconnect).
In addition to the runner failure ("disconnect"), the response time from GitHub support is too slow (up up to 8 days to resolve the issue).
The root cause is an infrastructure issue on GitHub's side. A complicating factor is GitHub's support, which is exceedingly slow with response times as slow as 8 days.
A possible solution is to use FlyCI's macOS runners. FlyCI offers M2 runners ranging from 4 vCPUs to 8 vCPUs (macOS 13 and 14), with the largest being the flyci-macos-14-xlarge-m2
runner with 8 vCPUs and 14 GB RAM.
The FlyCI runners are highly reliable and are supported by a very responsive dev team. Support is available by e-mail and in the Discord server of FlyCI, with response rates that aim to always be below 24 hours.
The switch is simple:
Step 1: Install the FlyCI GitHub app and grant it permissions for this repo. Step 2: Switch the relevant runner label to point to FlyCI's labels.
In this case, there are 3 workflow files that have the offending runner label:
An example of the change looks like this for testBuild:
iOS:
name: Build and deploy iOS for testing
needs: [validateActor, getBranchRef]
if: ${{ fromJSON(needs.validateActor.outputs.READY_TO_BUILD) }}
env:
PULL_REQUEST_NUMBER: ${{ github.event.number || github.event.inputs.PULL_REQUEST_NUMBER }}
DEVELOPER_DIR: /Applications/Xcode_15.0.1.app/Contents/Developer
- runs-on: macos-13-xlarge
+ runs-on: flyci-macos-14-xlarge-m2
Note: the solution uses an M2 runner/macOS 14 (8 vCPU and 14 GB RAM), which should also provide a performance boost of about 20% compared to the M1 runners.
Thanks for the proposal @kgantchev - I will consider this proposal, but probably will look at smaller solutions first remaining on GitHub Actions as we've standardized on GitHub runners and don't really want to splinter them across providers.
@AndrewGable Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
Going to see if macos-13-large
helps, I believe xl
might have been depreciated. This will still use intel
CPUs as we don't want to use arm64.
Reviewing
label has been removed, please complete the "BugZero Checklist".
The solution for this issue has been :rocket: deployed to production :rocket: in version 1.4.62-17 and is now subject to a 7-day regression period :calendar:. Here is the list of pull requests that resolve this issue:
If no regressions arise, payment will be issued on 2024-04-25. :confetti_ball:
Problem
About 50% of our iOS builds that use the
macos-13-xlarge
runner are being canceled with the error:Solution
Fix it