feedbin / feedbin

A nice place to read on the web.
https://feedbin.com
MIT License
3.41k stars 275 forks source link

Feedbin does not follow HTTP 301 redirects #100

Open bdesham opened 10 years ago

bdesham commented 10 years ago

I recently shuffled around some domain names and servers; one result was that my old Atom feed address started issuing an HTTP 301 redirect to a new URL. Other feed readers seem to have followed the redirect and no longer request the old URL at all, but Feedbin is still requesting that URL (which gives it a 301 every time). Feedbin should probably update users’ subscriptions so that feed URLs that issue redirects are replaced with the new target URLs.

danielcompton commented 7 years ago

This just caught me out too.

rik commented 6 years ago

https://github.com/feedbin/support/issues/579 is a duplicate of this.

I'm missing content I've subscribed to because of this. I've noticed today that a friend's blogpost was not showing because of this. But I have no idea how many feeds I've lost because redirections are no longer in place on previous hosts.

tedder commented 5 years ago

I have a feed that is being grabbed ~450 times by feedbin per day even though it's a 301 redirect. I'm going to turn it into a 404, maybe feedbin will stop in that case, though the feed will die to feedbin readers.

benubois commented 5 years ago

Feedbin does follow redirects but does not update stored feed urls.

Please familiarize yourself with the medium before you declare this as a bug.

https://www.w3.org/Provider/Style/URI

On Dec 25, 2018, at 5:43 PM, Ted Timmons notifications@github.com wrote:

I have a feed that is being grabbed ~450 times by feedbin per day even though it's a 301 redirect. I'm going to turn it into a 404, maybe feedbin will stop in that case, though the feed will die to feedbin readers.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

tedder commented 5 years ago

Thanks for the condescension. If everyone actually followed that recommendation Brewster Kahle would be bored.

It's up to you if you want to consider it a bug- but I suspect it leads to more than my feeds dying.

benubois commented 5 years ago

Sorry! Not trying to be condescending. Since you’re talking about knowingly breaking your urls it seemed appropriate.

On Dec 25, 2018, at 8:04 PM, Ted Timmons notifications@github.com wrote:

Thanks for the condescension. If everyone actually followed that recommendation Brewster Kahle would be bored.

It's up to you if you want to consider it a bug- but I suspect it leads to more than my feeds dying.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

rik commented 5 years ago

Feedbin does follow redirects but does not update stored feed urls.

@benubois: Is there any reason not to update the feed urls? As users, we don't have control over all the feeds we subscribe to and it's painful to lose content :(

brendanlong commented 5 years ago

There's a tradeoff here between too-quickly accepting a permanent redirect and losing content vs too-slowly accepting a permanent redirect and losing content. By misconfiguration, malicious behavior, or semi-malicious behavior (i.e. DNS expiring and someone else temporarily turning your site into a redirect to ads), there are several cases where a "permanent redirect" should be treated as temporary.

That said, I'm guessing at some point between a week and a month after first seeing the redirect, it's probably relatively safe to assume that it's actually permanent and update the links.

This is probably a non-trivial feature though, since you need to handle the old content of the feed (at a previous URL) still somehow being linked to the new URL, but only for users who were redirected. For example, if I sign up for example.com/rss-1, and you sign up for example.com/rss-2, and then rss-1 gets permanently redirected to rss-2, my feeds should contain the content of rss-1 and rss-2, but yours should only contain the contents of rss-2. Note that if you just act like I had always been subscribed to rss-2, then I may lose access to unread articles that were posted to rss-1 but not rss-2, and if you just add everything from rss-1 to rss-2, then you add an easily-exploitable way for people to inject articles into feeds for other sites.

I think Feedly handles this by surfacing feed change info in the UI but not automatically making the change. I know they have a list of broken feeds in the settings somewhere, and they may show feeds that are redirecting. This makes it so feeds don't just inexplicably stop updating, but adds a little more work for users. The benefit of this is that the user is the one making all of the changes so they can't complain when rss-1 is deleted, since they would be the ones to delete it.

There's probably something fancier you could do, like make each subscription have a primary feed and then a list of previous feeds to include content from. I wouldn't be surprised if @benubois doesn't want to deal with that complexity though.

benubois commented 5 years ago

Yes! I would like feeds to continue working no matter what the publishers do to them.

However, it is complicated. Both for the reasons @brendanlong mentions and some early naive decisions I made about Feedbin’s schema (feed urls have a unique constraint in the database and are used to generate unique article ids). Feeds are also a shared resource between users, so it’s not easy to let users edit feed attributes without duplicating a lot of data.

I do think that resubscribing to a feed at its new location is a viable work-around and avoids potential issues that would come from automatically updating feeds.

There is an upcoming feature that will flag feeds that are failing to update due to HTTP errors (4xx, 5xx, etc...) and parse errors.

rik commented 5 years ago

I'd be happy to manually decide "yes please, subscribe me to this new feed" when a feed 301s.

bdesham commented 2 years ago

@benubois You mentioned a feature that will flag feeds that are returning 4xx or 5xx errors. Has that been implemented?

olivierlacan commented 1 year ago

This instructions to debug feeds don't take redirects into account and incorrectly don't use the -F cURL flag (follow redirects): https://feedbin.com/help/debugging-feeds/

The debug command shown should likely be:

curl -Lsv "$FEED_URL" 2> headers.txt | xmllint --format - > feed.xml

@benubois can you confirm that updated 301 redirects or CNAME records are still never updated by Feedbin? My own blog has a CNAME I've had to update because my provider (Feedpress) changed their custom domain endpoint. Feedbin never notified me (or I presume anyone who was subscribed with Feedbin) that anything was wrong with the feed.

Since I have at least 12 subscribers using Feedbin, I can assume they'll never receive new posts from me since Feedbin simply stopped updating my feed silently. That's not great.

Will the upcoming feature you hinted at 5 years ago be implemented? Or was it shelved?

I do think that resubscribing to a feed at its new location is a viable work-around and avoids potential issues that would come from automatically updating feeds.

This would assume there would be any way for subscribers to know that a feed using a CNAME has a new destination, which is not the case. Currently, any feed provider changing their endpoints (or telling customers to manually switch CNAME destinations) would lead to subscribers no longer seeing updates.

It's unreasonable to expect even technically savvy folks to:

benubois commented 1 year ago

Hi @olivierlacan,

301 redirects have always been followed and are now cached.

I'm not clear on the CNAME part of your question. If a CNAME changes then it will be resolved to its new location whenever the DNS cache expires according to the record's TTL.

If you have a question about a broken feed, I'd be happy to look into it if you could provide the url.

Thanks for the updated debug curl!

Ben