Open knownasilya opened 8 years ago
My Strider instance HTTP server also starts timing out if multiple commit hooks come in at the same time and multiple projects are built. After a couple minutes it eventually seems to come back online (without a restart), but I'll see several 502s from my reverse proxy in the meantime if I try to load the Strider dashboard or send another commit webhook within those several minutes.
I'm not sure where the delay is, but I'm especially curious what's blocking the HTTP thread, I thought most of the prepare phase would be delegated to the workers?
Did you enable concurrent builds? That should help with multiple projects
@knownasilya yeah I'm at CONCURRENT_JOBS=4
. Would that affect the Strider HTTP server though? I don't mind waiting for the jobs to complete, the problem is that some of the GitHub webhooks are being dropped due to timeouts.
That's weird, maybe the timeout isn't sufficient for your proxy? The webhooks respond back to github almost instantly, once the job has been scheduled.
I verified it's not problem with my reverse proxy by running curl localhost:3000
immediately after a project begins the test/deploy cycle... I can actually reproduce it just by manually triggering one job through Test and Deploy through the UI and then immediately running curl localhost:3000
. The request will take considerably longer if even one job is being prepared (usually requests to the Strider index take approximately 1-2 seconds, if a job is being prepared the request will take approximately 30 seconds).
The curl localhost:3000
will take 3-4 minutes if 3-4 jobs are being started (even with 8 concurrent workers), which is too long for GitHub/BitBucket webhooks.
I downgraded our server back to a much older version of Strider, and the problem is resolved. Not sure what Strider commits introduced this problem, but the old version we're running again now ( https://github.com/Strider-CD/strider/commit/84a6b878f0b1b3d3528d3f5f19251353f07b4ea7 ) works great.
I've updated the simple-runner with additional debug statements, so if you have time to investigate in the future, please do, using DEBUG=strider*
to see if there is a runner error. You'll have to update the simple-runner in the plugins.
I've noticed this happening sometimes:
I'm not sure why it halts in "prepare", seems like that's the culprit here, and maybe an error isn't being handled correctly.