JustinBeckwith / linkinator

🐿 Scurry around your site and find all those broken links.
MIT License
1.02k stars 79 forks source link

Feature Request: concurrency per host #527

Open ManuelRauber opened 1 year ago

ManuelRauber commented 1 year ago

Hi there!

I've just switched from broken-link-checker to linkinator and I'm missing one little thing: to set the maximum concurrency per host.

Right know, I'm using Hugo and the Docsy theme to generate a GitHub hosted documentation. So, I've a lot of links going to GitHub.

Unfortunately, a lot of them will respond with 429. While linkinator retries them, after a certain amount it will eventually fail:

https://github.com/boundfoxstudios/community-project/edit/develop/docs/content/game-design-document/gameplay/player/index.md (from http://localhost:1313/game-design-document/gameplay/player/) -- reason: BROKEN http status: 429

It would be nice to have the possibility to limit how much concurrent requests are made to a host.

JustinBeckwith commented 1 year ago

This is something I've considered, but I really wonder out loud if the complexity would be worth it (as compared to the existing --concurrency property). How would you expect to use something like this from the command line? It would almost lead to needing to define these things in a string like:

$ linkinator website.com/page --host-concurrency "github.com 100" --host-concurrency "espn.com 100"

Even with that, it's unclear how the per-host concurrency would interact with the top level concurrency 😵 It's unclear to me that the code complexity and the config complexity that are really worth it.

ManuelRauber commented 1 year ago

@JustinBeckwith

This is something I've considered, but I really wonder out loud if the complexity would be worth it (as compared to the existing --concurrency property). How would you expect to use something like this from the command line? It would almost lead to needing to define these things in a string like:

$ linkinator website.com/page --host-concurrency "github.com 100" --host-concurrency "espn.com 100"

Even with that, it's unclear how the per-host concurrency would interact with the top level concurrency 😵 It's unclear to me that the code complexity and the config complexity that are really worth it.

Oh, I'd not expect to set the maximum concurrency per host. Just one setting for all hosts would be enough for my use case.

$ linkinator website.com/page --concurrency 50 --host-concurrency 4

In this case I'd expect, that there is no more than 50 concurrent requests and no more than 4 per host.

Would that be easier for the implementation?

JustinBeckwith commented 1 year ago

That absolutely would make things easier :) There's still some complexity in managing host level concurrency along with top level concurrency, but wth, I'm at least ok giving it a shot and seeing what happens.