Ranchero-Software / NetNewsWire

RSS reader for macOS and iOS.
https://netnewswire.com/
MIT License
8.41k stars 531 forks source link

Sync to Feedly not working properly Mac <> Mac #3779

Open dnanian opened 1 year ago

dnanian commented 1 year ago

I've got two computers logged into NetNewsWire, both Macs, both running Ventura. They're using the same Feedly account.

My MacBookPro can mark stuff read, and it reflects properly on the desktop Mac.

However, changes on the desktop Mac never appear on the MacBookPro.

I've tried logging out and back into Feedly, but it doesn't seem to make a difference...it's very weird.

Any way to debug this somehow?

vincode-io commented 1 year ago

Have you tried removing the Feedly account on the desktop Mac? It might be getting an error when trying to send the read state to Feedly. If you have, could you check to see if the read state from your desktop Mac is making it to Feedly by verifying using the Feedly web interface?

brentsimmons commented 1 year ago

Are you still seeing this bug?

dnanian commented 1 year ago

I'm re-verifying it today (although I assume so, given the version hasn't changed). I did try resetting both devices and still saw issues...and then cross-checked with an old version of Reeder which didn't have the problem...

dnanian commented 1 year ago

Well - I noted one thing: the laptop was using the most recent beta, and the desktop didn't have the beta checkbox checked, so it was using production.

However, as I was trying to test things further, I started getting an almost immediate error - "Forbidden". It looks like Feedly has put Cloudflare's site check code in front of their API and you can't access it...or the entire site may be down.

dnanian commented 1 year ago

So, @brentsimmons - the site wasn't down. I checked with their support, and got this as a reply, which I think you might be interested in (and perhaps should be split into a second report):

There is a bug in NetNewsWire, where it sometimes tries to mark-as-read a very large list of articles. In this case, it tried to mark 19,332 articles as read (the limit is 2,000). It's unclear to me why it does this. This causes Feedly servers to return an error, but NNW retries the same request in a loop anyway, multiple times a second... After too many errors, our abuse-prevention systems kick in and bans the IP address for a few hours. This will cause all access to Feedly apps to be blocked. This explains why the problem you ran into was temporary, and also why there was no reported outage... Unfortunately I don't have a good solution or workaround to this, except to not use NNW...

brentsimmons commented 1 year ago

Sheesh! That’s super-good to know. That’s a thing we can fix. :)

vincode-io commented 1 year ago

The Feedly sync in NNW only sends 300 entries at a time for marking article statuses. I verified that this in the code and by using ProxyMan to introspect the service call. The Feedly sync also doesn't automatically retry immediately. The statuses are requeued and will try again in 2 minutes, which is the standard for all NNW syncing accounts. Given this, the Feedly support message is very confusing.

Most likely the support person at Feedly misread or miswrote something. NNW would have tried to status all these articles at the same time, but in 64 different service calls, all sent as once. NNW probably sent too many requests at the same time and triggered Feedly's request flooding protections.

It might be a good idea to contact Feedly support to find out what constitutes "API flooding". Also, is that 2000 article marking limit for a given period of time or per API call? With some more information (currently undocumented). FeedlyAPICaller could be easily updated to send more articles per request and/or only send so many articles every 2 minutes.

@dnanian To get you working again, I recommend that remove your Feedly accounts from both of your Macs. Then mark as many articles as read as you need to through Feedly's web interface. You shouldn't need to do this part again unless you have to mark a huge number of articles all at once. Finally, add the Feedly accounts back to NNW. Removing and adding the Feedly accounts from NNW will stop it from trying to status that 19k+ number of articles that it has queued up.

dnanian commented 1 year ago

I'm already going again - I'm guessing the "mark as read" happened, maybe, because during the period between report and re-test I had been using Reeder - so NNW was behind. What's weird, though, is that those messages would have already been marked as read - there's no way I marked thousands of messages read at once...

I reached out to the platform engineer who wrote that reply with your response and will let you know what they say.

dnanian commented 1 year ago

I've received a reply from them. I have an email address, but I don't want to add that or their name to the case - I'll email Brent and you can decide what to do with the info (they said they'd be happy to discuss it):

I was mistaken in some of my conclusions: NNW doesn't retry exactly the same request. The number of entries increases by 1-2 between requests. Also there are "clusters" of requests separated by a few seconds. I suspect NNW has a backlog of articles marked as read from previous sessions. These articles could not be marked as read on Feedly, most likely because NNW already passes too large a list. But every time an article is read, NNW adds it to the list, then tries to mark the whole batch as read? That list will always be too large to be processed, so these requests generates API errors. If there are too many errors in a row, it can lead to temporary IP ban.

Looking at recent logs, NNW generated 28,000 such API errors. For some customers, NNW sends close to 50,000 articles at a time... So I would say you're not the only person affected by this issue.

In any case, I attached a list of requests from that day: the server returns HTTP/400 errors for a while, then switches to HTTP/429. Here are the relevant app logs for a few of the requests:

requestId: 7a5cd084aea541a1-EWR 2023/03/10 @ 08:08:07.366 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:08:07.336 large number of markAsRead entry ids: 19220

requestId: 7a5cd0812be641a1-EWR 2023/03/10 @ 08:08:06.826 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:08:06.789 large number of markAsRead entry ids: 19219

requestId: 7a5cd0772b5f41a1-EWR 2023/03/10 @ 08:08:05.255 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:08:05.191 large number of markAsRead entry ids: 19218

requestId: 7a5cd05e3b2b41a1-EWR 2023/03/10 @ 08:08:01.220 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:08:01.177 large number of markAsRead entry ids: 19217

requestId: 7a5cd058fc6b41a1-EWR 2023/03/10 @ 08:08:00.391 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:08:00.359 large number of markAsRead entry ids: 19216

requestId: 7a5cd051db6341a1-EWR 2023/03/10 @ 08:07:59.239 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:07:59.210 large number of markAsRead entry ids: 19215

requestId: 7a5ccefdcdd041a1-EWR 2023/03/10 @ 08:07:04.854 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:07:04.829 large number of markAsRead entry ids: 19214

requestId: 7a5ccefdcdd041a1-EWR 2023/03/10 @ 08:07:02.145 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:07:02.107 large number of markAsRead entry ids: 19213

requestId: 7a5cce8c6ca541a1-EWR 2023/03/10 @ 08:06:46.699 markersV3ModuleAction.POST: action error [too many entries] (400 BAD_REQUEST) 2023/03/10 @ 08:06:46.672 large number of markAsRead entry ids: 19212

As you can see, the number of entries increases by 1 every time.

As for the limits: you can mark roughly 2,000 articles as read per day. After that, the API may ignore the requests. The mark-as-read API itself has a limit of 1,000 entry ids per call. These limits are indeed undocumented, sorry about that. It is not useful to mark articles older than 31 days as read: they are considered as read by the API anyway.

I don't know how the NNW app is structured, so my recommendations may not be relevant. I would start by removing entries older than 31 days. After that if the backlog still has more than 1,000 entries, maybe process it by small increments? E.g. every time a new entry is read, pass 5 entries from the backlog, in order to stay under the "1,000 entries marked as read per day" limit? Eventually, the backlog should disappear.

I will try to see if there's a way to add some custom logic on the server to handle these huge requests.

vincode-io commented 1 year ago

@dnanian You might want to point your contact to this issue so that they can comment on it directly.

vincode-io commented 1 year ago

markersV3ModuleAction.POST must accumulate requests and bundle them. It is easy to see that NNW requests to the markers endpoint only contain 300 entryIds per request. If Feedly support feels this is otherwise, it would be beneficial to get an actual markers POST HTTP request with body and header so that we could make sure it came from NNW.

I can't make any sense out of the 2,000 articles per day limit on marking articles as read. I could see limiting the total number of API calls per time period, but not the total amount of articles marked. Is it ok to send 2,000 API requests to mark articles as read, one at a time, but not to send 3 API requests that mark 2,001 articles as read? The 2000 API calls is clearly more server resource intensive.

How is an RSS reader supposed to handle this situation? If a user marks 2500 articles as read in one shot, does the reader only sync 2000 articles and then just be out of sync with other readers and the Feedly website itself?

This suggestion would keep an RSS reader out of sync for over 9 years assuming the user never exceed the limits again:

if the backlog still has more than 1,000 entries, maybe process it by small increments? E.g. every time a new entry is read, pass 5 entries from the backlog, in order to stay under the "1,000 entries marked as read per day" limit?

(I'm assuming that the author intended to write 2,000 instead of 1,000 here.)

The only way that an RSS reader could stay in sync is to disable the ability to mark an article as read if they have exceeded 2,000 read articles that day. There is no good user experience I can imagine where that would be possible.

I would like to know if the Feedly app is itself constrained in this way. It looks like it doesn't have a way to bulk mark articles except by using older than 1 day or 1 week. It then uses the markers API to mark using a categoryID of category/global.all and an asOf timestamp parameter. What happens when this exceeds 2000 articles? I can't figure it out by snooping the Feedly app traffic when using the free tier as it is limited to 100 feeds.

If using the markers endpoint with categories is exempt from any article limits, NNW could remove all other bulk marking functions except for the "Mark Older Than" ones on Feedly accounts. "Mark Older Than" would have to reimplemented to make this work though...

dcfeedly commented 1 year ago

Hi, Feedly dev here.

If Feedly support feels this is otherwise, it would be beneficial to get an actual markers POST HTTP request with body and header so that we could make sure it came from NNW.

Unfortunately we do not log the full request. But I can assure you that these request are coming from NNW: the authorization header doesn't lie...

I will try to push a debug build to log some of these requests.

I can't make any sense out of the 2,000 articles per day limit on marking articles as read.

Sorry, my explanations are confusing. This daily limit is "fuzzy" because it's driven by cached object size. It's not so much that the API will prevent you. But if there are too many individual entries marked as read for a given day, some objects don't fit in the cache anymore, causing some APIs to become very slow (because they have to reload from the DB every time), apps become unresponsive, customers run into timeout issues etc. Bad experience.

The only hard limit is: 1,000 entry ids per "mark entries as read" request. If you exceed this limit the API call will fail with an HTTP/400 error.

How is an RSS reader supposed to handle this situation? If a user marks 2500 articles as read in one shot, does the reader only sync 2000 articles and then just be out of sync with other readers and the Feedly website itself?

The answer is, as you suggested, to mark the feed as read or mark the category as read. Feed markers are much more efficient: it's a single marker per feed, which can be used to quickly determine if an entry is read or unread. It eliminates the need to mark each individual entry as read, which is a giant pain for the apps and for the server. The API even supports undo.

When I originally talked with the NNW devs, they mentioned that using feed markers would be hard to do? So I didn't push too hard.

Anyway, I'm also working on a server patch to try to unblock other customers and be "smarter" about accepting these giant lists of read articles. But even if I can get my changes approved and deployed, it would be beneficial for everyone to use feed markers...

brentsimmons commented 1 year ago

Thanks so much, @dcfeedly. We may have a bunch of questions to follow up here — and I’ll start with one. When you wrote…

I would start by removing entries older than 31 days.

…what is the date to look at? The Feedly crawled date or published date? (Or something else?)

dcfeedly commented 1 year ago

The Feedly crawled date or published date? (Or something else?)

Use crawled, that's what Feedly uses to sort articles for feeds. published is the date reported by the feed publishers, and it's not always reliable.