XMLTV / xmltv

Utilities to obtain, generate, and post-process TV listings data in XMLTV format
GNU General Public License v2.0
269 stars 93 forks source link

Timeout for server requests #145

Closed qdacsvx closed 2 years ago

qdacsvx commented 2 years ago

It's convenient for a web scraper to have a timeout for each server request in case the server lags too much or simply never responds. The default value could be 100 seconds. On a server timeout an error message e.g. "Could not fetch [object]. Request timed out." would be printed and the request would be considered to be a failure, possibly transient.

Normal users can't address this type of error therefore it should be non-fatal.

If data loss occurs due to a channel listing or a programme info page failing to download, the grabber should (eventually) exit with an error code.

honir commented 2 years ago

The default timeout value for page requests is 180 seconds.

The fequency of timeouts depends on two things: (i) the quality of your internet connection, (ii) the performance of the website. I would guess that 95% of timeouts are caused by a poor internet connection (for example, I get no "retry" events on uk_tvguide)

It is up to the individual grabber to determine the action to take on request timeout. The grabber would need to be context-aware to deal with a timeout: for example a failure to read the list of TV channels has to be fatal, whereas a failure to retrieve programme's supplementary description could be considered non-fatal.

I venture that telling a "normal user" the fetch "failed - try again" is easier to understand than "some programmes could not be retrieved" (think: which ones? how many? did the grab actually work?).

The end result is probably the same either way: the user will simply try the fetch again.

qdacsvx commented 2 years ago

Good to know that there is a timeout for page requests. I assume this applies to all server requests - both getting the programme listings page for a given channel and getting the programme info page for a given programme.

Is there a test for request timeout? I've occasionally found it necessary to use an external timeout killer with timeout of 356 seconds (per channel) - this was necessary because the server sometimes delayed responding for minutes - I could observe this in debug mode. This was in an older release a few years ago.

I agree that failing to download the list of all tv channels is a fatal failure, however it's only necessary to obtain the list of channels during configuration. After configuration, when a grabber runs normally, the grabber can use the list of known good channels in the configuration without consulting the server.

honir commented 2 years ago

After configuration, when a grabber runs normally, the grabber can use the list of known good channels in the configuration without consulting the server.

Experience has shown that's not always a good idea. What happens when a channel changes its name or its number so the user's config file is out-of-date? The grabber wouldn't know and would try and fetch programmes for the now defunct channel, and eventually barfing with a fetch failure.

By checking whether the user's requested channels still exist then a simple warning can be printed, and the failure avoided.

honir commented 2 years ago

Closed - too general. If you have an issue with a specific grabber then open an individuated issue for that grabber only.