fossar / selfoss

multipurpose rss reader, live stream, mashup, aggregation web application
https://selfoss.aditu.de
GNU General Public License v3.0
2.35k stars 343 forks source link

[Feature Request] Configurable Update Frequency/Cache per feed #1419

Open deathbybandaid opened 1 year ago

deathbybandaid commented 1 year ago

It would be really nice to have the ability to set a time frequency for how often to allow a feed to be updated.

For my use-case, I use Huginn to create custom feeds for youtube, and Huginn checks for new channel content every "X" amount of time. I use a looping bash script to php /path/to/cliupdate.php (and controlled by systemd as a service). This works great.

I also use selfoss to watch github releases for various projects so I know when there are updates, and can read the release notes. The above looping script would then be technically bad for how often I query githubs feeds.

What I propose is the ability to set -1 for feeds that can update as infinitely often as selfoss is queried to do so, and a configurable seconds/minutes 180s/2m for telling selfoss, DO NOT query feed unless that duration has passed. This would need to utilize a timestamp saved in the database for when we last updated a specific feed.

There may be benefit to also being able to manage the cache time from the web interface as well.

jtojnar commented 1 year ago

This would need to utilize a timestamp saved in the database for when we last updated a specific feed.

We store that in the lastupdate column of the sources table and use it to avoid updating a source in the 20 second window from last update attempt:

https://github.com/fossar/selfoss/blob/296f8c403f89bd4a69f1589cddb5a782a4586706/src/helpers/ContentLoader.php#L102-L106

Interestingly, I just noticed that, since the lastupdate is set to the current time by updateSource method, the window is floating. As a result, if the frequency of attempting to update the particular source is less than 20 seconds, it will never be updated again. :laughing:

What I propose is the ability to set -1 for feeds that can update as infinitely often as selfoss is queried to do so, and a configurable seconds/minutes 180s/2m for telling selfoss, DO NOT query feed unless that duration has passed.

I think this would be useful feature but I struggle with creating a nice user interface for it. Especially since we would want to implement request caching (e.g. using ETag header)) and TTL metadata supported by RSS and any such feature would need to interact with those.

See also the issue #750 and pull request #829. Maybe for now, instead of a infinite loop, create a systemd timer for ~30 minutes if you are worried about hammering the sites?

heull001 commented 1 year ago

It would be really nice to have the ability to set a time frequency for how often to allow a feed to be updated.

+1

I would like this feature very much too. also, it would be handy to set the time for tags as well, if that is not to complicated. Of course, it would then also have to be considered what is done by tags in the case of conflicting times, always the shortest period would make sense in my opinion. And a time set for a feed should be prioritized.

jtojnar commented 1 year ago

The implementation would not be very hard but we would need to think it through so that selfoss does not become confusing. We want to avoid situations where user changes update frequency, forgets about it, and then is confused why a feed is not updating. Each new feature also needs to be considered with other potential future features like request caching or TTL metadata in mind.

I would probably avoid tag update frequencies precisely because it would be pretty confusing, as you mention.

And not to forget, selfoss is aiming to be somewhat minimalist so each new feature should come with a sufficient motivation. I mean, reducing traffic is a nice goal – but unless you are updating the feeds continuously, it really should not be that bad (for RSS-based feeds, the requests are already cached). Maybe consider reducing the global update frequency instead? Unless you use selfoss to live-project news on a wall or something, or hoping to catch accidentally published articles, I struggle to find a reason to update the feeds more frequently than hourly.

heull001 commented 1 year ago

I would probably avoid tag update frequencies precisely because it would be pretty confusing, as you mention.

That's a good argument, tag-based frequency had been a nice-to-have, but I don't realy need it, so I'm okay with this decision. Feed-based frequency is clearly more important in my opinion.

Maybe consider reducing the global update frequency instead? Unless you use selfoss to live-project news on a wall or something, or hoping to catch accidentally published articles, I struggle to find a reason to update the feeds more frequently than hourly.

Sure, but about 50 percent of my feeds don't need to be updated more than once or twice a day.\ I think many other users also have a number of feeds that don't need to be updated as often as others, so I see potential here for significant savings in processing power and bandwidth with little effort.