Open evdb opened 10 years ago
Hi @evdb,
Happy to discuss these limits with you.
The 60 day limit is how long unread items are stored for. After 60 days they are automatically marked as read. There is also a limit of 500 entries per feed.
The reasons for these limits are performance and cost.
There are two separate scaling issues that both these limits are meant to address.
The solution to these scaling issues would be to buy more hardware, but then we run into the second factor: cost.
The primary database server costs 1,808 USD/month. To keep adding this type of machine is prohibitively expensive. Feedbin's only source of revenue is paying customers and right now $1,808 would be a significant expense.
Maybe you could explain a bit more about your use case. What issues are you running into with the current limits?
Interesting constraints. I'll give this more thought, and look through your existing schema and see if I can come up with an approach that lets me have more unread and you have lower DB bills :)
I tend to subscribe to many feeds, and then read the entire feed in oldest first order. However there are many feeds that I've not got round to reading / skimming for more than 60 days. The behaviour that prompted this issue was that I'd look at feedbin.me in the morning and see something that looked interesting in the 'Unread' list (which for me is sorted oldest first) and then when I returned in the afternoon it was gone.
The way I read feeds is definitely oldest entries first, and only rarely do I skip an entry (I either read it, or skip it, either way it should be marked as read). The only time I'll have an unread entry with read neighbors is if I've decided to leave it as unread to prompt me to do something with it later (I read both using Reeder on my phone and using a browser on the feedbin.me website).
For my use case a data model like this might be more efficient than your current one - at least in the amount of data stored, if not computation needed to work with (presented as a JSON document, but easily represented using tables and rows):
{
"feeds": {
"foo.blog.com/feed.rss": {
"all_read_before": "2013-04-17 12:34:56",
"all_unread_after": "2013-04-20 12:34:56",
"unread_entries": [ "foo12", "foo13", "foo23" ],
},
...
}
}
The timestamps for the all_read_before
and all_unread_after
, and the psuedo-IDs for unread_entries
would need changing to something more compatible with your current data model.
Here is a pretty picture (each purple horizontal line is a feed, being read in a particulary way):
The advantages are that:
all_read_before
and all_unread_after
markers.all_unread_after
and adding the number of entries in unread_entries
. If value used in all_unread_after
is snapped to the timestamp of an entry then this becomes cacheable up until the point a new entry is added to the feed (in which case cache could be trivially updated, or just re-calculated).all_read_before
and all_unread_after
to the latest entry timestamp, clear read_entries
).Disadvantages are that:
all_read_before
order and gathered up entries as needed to populate list.select count(*)
query on one table, it would be at best the sum of a series of cache hits plus a count(*) operation, at worst two queries per feed plus the read count.But perhaps the biggest win is that this model would completely separate the feed entries from the subscriber data. As there is now no need the update the unread data when new entries are scraped.
As such the two datastores could be separated. Possibly the subscriber data could even be archived to a file on disk and deleted from database after some period of inactivity. It could then be loaded back into the database when the user returns from the site. I know that I usually read in small chunks a couple of times a day, so I could certainly be ditched from the database for large periods of time. A document database like MongoDB would be well suited to this (one document per user) and has querying methods that would be suitable.
Sorry to be so long winded, and to go off-topic so much for the title of this ticket. I've also probably been very simplistic and there are lots of edge cases that I've not considered. It is an interesting problem though :)
Unread entries do get deleted as people mark items as read, but the vast majority of items stay unread.
Hmm, if this is the case perhaps the unread_entries
should be changed to read_entries
for items between the markers. The way of calculating the unread count would need to change a little, but would use the same number of queries, I believe.
Hi @evdb,
Thanks for taking the time to flesh out this idea, it definitely looks interesting.
Before going much further with considering a new model I was wondering how many unread items you typically have?
I've currently got 1050 unread, but it has been as high as 1500. I know that I should cull my reading list :)
Not sure if this is exactly the same request (I think I'm asking for the converse), but I would love to have Feedbin automagically mark as read items in certain feeds that have been sitting longer than, e.g., 2 or 3 days. I have several feeds that are voluminous but the info in them quickly becomes stale. Culling out the older posts automatically would speed things up for me considerably.
I would love to have Feedbin automagically mark as read items in certain feeds that have been sitting longer than, e.g., 2 or 3 days. I have several feeds that are voluminous but the info in them quickly becomes stale
This is a few years old, so wanted to give it a bump. This would be super-valuable.
I believe that entries in feeds expire after 60 days. It would be great if this timescale could be configured in the settings - I'd want it much longer. It could also be more visibly displayed on the site (I only found reference to it in other sections).