esbenp / pdf-bot

🤖 A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs
MIT License
2.63k stars 142 forks source link

pdf-bot limited to one machine rendering pdf's #18

Open danielwestendorf opened 6 years ago

danielwestendorf commented 6 years ago

I'm looking for feedback from @esbenp before I dig into a PR for this.

Goal:

I'd like to adapt pdf-bot to be a scaleable pdf rendering microservice which can have resources added/removed on demand to handle workload fluctuations.

Problem:

Because of pdf-bot's PostgreSQL database wide queue locking, only one machine can render pdf's for the given API endpoint at a time.

Because PG is a shared database, it' possible to scale the work load horizontally across many machines in parallel. To accomplish this, we would need to change the queue locking mechanism to be on a per-job basis, and adapt the generation commands (shift:all comes to mind) to support this.

There are a few concerns here:

1) This would require a database migration of some sort to support 2) Process crashes, unhandled errors, etc could result in jobs never being processed if implemented poorly 3) ?

Purposed implementation:

id processing_started_at completed_at
1 2018-01-08 17:31:17.825153 2018-01-08 17:31:48.925153
2 2018-01-08 17:31:17.825153 null
3 2018-01-08 17:31:47.925153 null
4 2018-01-08 17:31:48.925153 null
5 null null
6 null null

Given this sample data, jobs 2, 5, and 6 would be eligible for the next generation worker to start processing, while jobs 3 and 4 are assumed to be currently processing.

If this all sounds like too big of an overhaul, I'd be open to other suggestions. I'd also be willing to add the support to a new Redis database adapter instead as well.