feedbin / support

83 stars 11 forks source link

Feed entry expiration should be configurable, and publicised #419

Open evdb opened 10 years ago

evdb commented 10 years ago

I believe that entries in feeds expire after 60 days. It would be great if this timescale could be configured in the settings - I'd want it much longer. It could also be more visibly displayed on the site (I only found reference to it in other sections).

benubois commented 10 years ago

Hi @evdb,

Happy to discuss these limits with you.

The 60 day limit is how long unread items are stored for. After 60 days they are automatically marked as read. There is also a limit of 500 entries per feed.

The reasons for these limits are performance and cost.

There are two separate scaling issues that both these limits are meant to address.

The solution to these scaling issues would be to buy more hardware, but then we run into the second factor: cost.

The primary database server costs 1,808 USD/month. To keep adding this type of machine is prohibitively expensive. Feedbin's only source of revenue is paying customers and right now $1,808 would be a significant expense.

Maybe you could explain a bit more about your use case. What issues are you running into with the current limits?

evdb commented 10 years ago

Interesting constraints. I'll give this more thought, and look through your existing schema and see if I can come up with an approach that lets me have more unread and you have lower DB bills :)

evdb commented 10 years ago

I tend to subscribe to many feeds, and then read the entire feed in oldest first order. However there are many feeds that I've not got round to reading / skimming for more than 60 days. The behaviour that prompted this issue was that I'd look at feedbin.me in the morning and see something that looked interesting in the 'Unread' list (which for me is sorted oldest first) and then when I returned in the afternoon it was gone.

The way I read feeds is definitely oldest entries first, and only rarely do I skip an entry (I either read it, or skip it, either way it should be marked as read). The only time I'll have an unread entry with read neighbors is if I've decided to leave it as unread to prompt me to do something with it later (I read both using Reeder on my phone and using a browser on the feedbin.me website).

For my use case a data model like this might be more efficient than your current one - at least in the amount of data stored, if not computation needed to work with (presented as a JSON document, but easily represented using tables and rows):

{
  "feeds": {
    "foo.blog.com/feed.rss": {
      "all_read_before": "2013-04-17 12:34:56",
      "all_unread_after": "2013-04-20 12:34:56",
      "unread_entries": [ "foo12", "foo13", "foo23" ],
    },
    ...
  }
}

The timestamps for the all_read_before and all_unread_after, and the psuedo-IDs for unread_entries would need changing to something more compatible with your current data model.

Here is a pretty picture (each purple horizontal line is a feed, being read in a particulary way):

2014_01_22_10_50_40

The advantages are that:

Disadvantages are that:

But perhaps the biggest win is that this model would completely separate the feed entries from the subscriber data. As there is now no need the update the unread data when new entries are scraped.

As such the two datastores could be separated. Possibly the subscriber data could even be archived to a file on disk and deleted from database after some period of inactivity. It could then be loaded back into the database when the user returns from the site. I know that I usually read in small chunks a couple of times a day, so I could certainly be ditched from the database for large periods of time. A document database like MongoDB would be well suited to this (one document per user) and has querying methods that would be suitable.

Sorry to be so long winded, and to go off-topic so much for the title of this ticket. I've also probably been very simplistic and there are lots of edge cases that I've not considered. It is an interesting problem though :)

evdb commented 10 years ago

Unread entries do get deleted as people mark items as read, but the vast majority of items stay unread.

Hmm, if this is the case perhaps the unread_entries should be changed to read_entries for items between the markers. The way of calculating the unread count would need to change a little, but would use the same number of queries, I believe.

benubois commented 10 years ago

Hi @evdb,

Thanks for taking the time to flesh out this idea, it definitely looks interesting.

Before going much further with considering a new model I was wondering how many unread items you typically have?

evdb commented 10 years ago

I've currently got 1050 unread, but it has been as high as 1500. I know that I should cull my reading list :)

thenerdlawyer commented 9 years ago

Not sure if this is exactly the same request (I think I'm asking for the converse), but I would love to have Feedbin automagically mark as read items in certain feeds that have been sitting longer than, e.g., 2 or 3 days. I have several feeds that are voluminous but the info in them quickly becomes stale. Culling out the older posts automatically would speed things up for me considerably.

coffeemug commented 4 years ago

I would love to have Feedbin automagically mark as read items in certain feeds that have been sitting longer than, e.g., 2 or 3 days. I have several feeds that are voluminous but the info in them quickly becomes stale

This is a few years old, so wanted to give it a bump. This would be super-valuable.