JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.
Other
758 stars 162 forks source link

Reduce polling for XF (SpaceBattles/SufficientVelocity/etc) sites #222

Closed Xon closed 7 years ago

Xon commented 7 years ago

I've had a case of someone using FFF to hit the SpaceBattles some ~30000 times a day for updates.

It looks like you aren't caching HTTP redirects from the thread id to the 'canonical' url, and it looks possible that your app can send multiple simultaneous requests for the same content.

It appears to be the same content being requested in a tight loop, over an over again. SpaceBattles's loadbalancer health checks hit a vastly cheaper monitoring point less times per day.

I'm aware you can't enforce this on the user's of your software, or even force people to update, but some sane defaults or warnings about being overly aggressive would be appreciated.

JimmXinu commented 7 years ago

I agree; that level of traffic from a single user is unreasonable.

I don't cache HTTP redirects because the 'canonical' URL can change. I'm open to suggestions if there's a better way.

It shouldn't be sending multiple requests for the same content--unless it's coming from different users on the same system (eg, the FFF web service), or possibly different URLs that redirect to the same URL. I can investigate that in more detail if you can provide a concrete example.

FFF specifically does not do automated polling for updates. And FFF internally caches pages it's already fetched in any run. But there are users who have modified or wrapped it for their own purposes.

My usual approach is to put a setting in the default configuration to put some sleeps between fetches. Some sites I also put in hard coded sleeps. (I don't have sleeps between fetch and redirect fetch because those are handled inside a library.) I will put some delays in the next version.

Xon commented 7 years ago

The feedback I got from the user after being notified was:

there was a script that was erroneously restarting itself after failing attempts to download posts in story-threads.

Is there any re-try logic that could cause this? (or re-try with a little buggy logic, it happens).

I don't cache HTTP redirects because the 'canonical' URL can change. I'm open to suggestions if there's a better way.

My suggestion is to just store the URL you get redirected to (assuming it is a valid thread). The actual thread-id (ie the number after the '.', is what matters). The human-ish readable name is literally there to look pretty, and relatively rarely changes.

There are cases where a thread can be a 'redirect' which all it does is push you to a different thread id, and you should permanently move to that new thread as by-default these redirect-threads will expire.

JimmXinu commented 7 years ago

Changes made to use 'canonical' URLs to reduce redirects and a couple others to reduce base_xenforo fetches.