c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
362 stars 110 forks source link

Added a rate limiter for load reduction on the website #52

Open Bash- opened 5 years ago

Bash- commented 5 years ago

We found that websites found the scraper too resource consuming. Therefore I added this configurable rate limiter, to be able to decrease the number of requests per time period.

c4software commented 5 years ago

Hi,

It's a nice addition, however ratelimit seems not present in the default library. It's an additional library (https://pypi.org/project/ratelimit/) ?

Bash- commented 5 years ago

Hi,

It's a nice addition, however ratelimit seems not present in the default library. It's an additional library (https://pypi.org/project/ratelimit/) ?

Thank you. Yes that is indeed the library which is needed

Garrett-R commented 4 years ago

Coincidentally, I'm adding a requirements.txt file in this PR. If the decision is made that that is desirable and the PR is merged, then you could add a line to that file:

ratelimit==2.2.1

Without this, I reckon this PR can't be merged, otherwise, it won't work for people who don't happen to have ratelimite pre-installed.

Bash- commented 4 years ago

@Garrett-R I think it still would be a nice addition @c4software Shall we include the ratelimiter?

c4software commented 4 years ago

Hi,

Ratelimiting is a good addition, but i'm not a big fan of a tierce library. Not because its a tierce library, but I really like the idea of a « bare metal » tool.

What do you think @Garrett-R @Bash- having to rely to a tierce library is not a problem for you ?

Garrett-R commented 4 years ago

Yeah, it's an interesting point and wasn't sure what you'd think of introducing a requirements.txt file. There's definitely some benefit to having no dependencies. Going from 0 to 1 dependencies is a big difference (while going from 1 to 2 is not). I think it's really up to you and your vision for this already successful project.

A couple factors I'd be weighing if I were you:

On the point about how tough the package is to use for folks, I actually think we should put this on PYPI (made issue here). In that case folks would just have to do:

pip install python-sitemap

and the extra dependencies will automatically be handled.

I'd probably cast my vote for having dependencies, but I can definitely see benefits to both approaches, so I support whatever you think is best.

Garrett-R commented 5 months ago

You know, in light of the xz backdoor, I have a new appreciation for avoiding dependencies...

Maybe close this one?