apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev
Apache License 2.0
15.53k stars 665 forks source link

Race conditions in CI/CD #2417

Open barjin opened 7 months ago

barjin commented 7 months ago

We currently use workflow dispatch from the release job to update the Docker build settings (here).

The issue is that the workflow dispatch doesn't wait for the workflow to finish, creating a race condition with the next job in the original workflow (version_docs)

release(job) ------------------------> version_docs
              \-(workflow-dispatch)--> crawlee-docker-image-bump

Both version_docs and crawlee-docker-image-bump are pushing into the master branch in this repo, causing race conditions (git won't push on a branch that got new commits since the last pull - causing either of the two jobs to fail).

For better reproducibility and less hassle - perhaps we can use something like https://github.com/marketplace/actions/trigger-workflow-and-wait to wait on the workflow-dispatch until it's done... and only then run the version_docs job?

cc @vladfrangu

vladfrangu commented 7 months ago

Ooooof, good catch. I'd say best bet is to merge the versioned docs into the release flow, but this can definitely be an issue in the future too... For simplicity i think we can just use the retry step we use for deploys to make it try to git pull, push several times

barjin commented 7 months ago

Yeah, unfortunately, we need the version_docs to run conditionally (only on major / minor releases, not patch). The retry step we use for deploys sounds - although still a bit hacky - like what we need right now, do you have a link for that pls?

vladfrangu commented 6 months ago

Its the same step we use in I think docker cis on the docker repo to see if the module versions are published to npm yet. I can take a look at pr-ing this fix this week 👀

barjin commented 6 months ago

I mean, if you know where to look, it would be nice if you could 🙏🏽 No worries if you don't find the time, it's quite low-prio and I'll revolve to it sooner or later :)