Ranchero-Software / NetNewsWire

RSS reader for macOS and iOS.
https://netnewswire.com/
MIT License
8.41k stars 531 forks source link

iCloud sync too slow (initially, or after adding lots of feeds) #3039

Closed brentsimmons closed 3 years ago

brentsimmons commented 3 years ago

People talk about waiting hours for it to sync. We need to greatly improve the experience.

We have to consider not syncing article content as a possible solution. This could mean mismatched unread counts, yes, but I’ll pay that price for much faster syncing.

(Hopefully we can still sync article content! Not doing so is the fallback position.)

vincode-io commented 3 years ago

I'll think some more on this, but I don't think this will fix our problem.

The problem is that if the user is being throttled, it doesn't matter what kind of record we are requesting. Content or status records will both be throttled. So requesting the status records will have iCloud still telling us to wait X seconds per fetch, which can sometimes be minutes. Right now I can't think if a way to defer getting the status records and not go out of sync.

I'm not even sure that removing the content syncing will help. What causes you to get throttled is completely undocumented. Is it based on number of records requested or number of MB's? If it is based on records, removing the content probably won't help since most records are status records. If it is based on MB's then it could help people with large unread counts.

vincode-io commented 3 years ago

This article addresses the problem, but I can't figure out how to make the authors hack work with NetNewsWire. The application they talk about only needs most current data since it is a chat app.

https://www.justtact.com/blog/three-methods-of-retrieving-records-from-cloudkit

jaanus commented 3 years ago

I have a few thoughts on the matter. As a fellow CloudKit developer, I follow this syncing topic around NNW with great interest, because as we have all found, CloudKit is opaque, and every bit of knowledge helps.

My post linked above only talks about the read side. It does not talk about write side. It’s not clear to me in NNW - on which side is the throttling happening? Read or write? I have to admit, I have tortured CloudKit quite a bit, and I have never seen throttling on the read side. I have probably not tortured it enough, or have tortured in the wrong way. Maybe the nature of my records is different enough that this doesn’t happen as much.

What causes you to get throttled is completely undocumented. Is it based on number of records requested or number of MB's?

I would very much like to know this as well. Another variable could be the number of CloudKit requests - is it a few large batch requests, or many tiny requests.

I remember I once did a test with uploading 10K records (in batches of 200 or even more) and downloading them shortly after. The records were tiny, and there was no throttling.

The only documentation about CloudKit resource limits I am aware of, is on their web services page.

Here’s another fun fact. CloudKit does not enforce asset size limit with the native API-s. The above page states that asset size limit is 50MB. What happens when you try to move an asset bigger than that with the native API?

Nothing. No limit is enforced. CloudKit happily moves assets of any size for you, as long as they stay within the user’s iCloud storage limit. If you hit that (or get close - I didn’t test with single byte precision), the operation is rejected and, yes, you do get throttled after that on the upload side. (Another fun fact if you are using a shared database and CK resource sharing: throttling seems to be per zone. If you have a shared zone where you do something to get throttled, other operations against the same zone, even by other users, also get throttled for a while. Well, I am not sure if it is per zone or per database, but we definitely did observe throttling being applied to several users due to what one of them did to a zone.)

So, stated asset size limit is 50MB, but it does not apply with native API-s. Does it then apply with the web API-s? Maybe. I haven’t tested that.

Okay. That was a bit of a digression. But why I brought the assets up: I started to think about the NNW problem, and I can say that CloudKit handles assets very well. So, one solution to this sync problem could be:

When you find that you need to sync a large amount of state, you don’t think of it as individual CloudKit record changes. Instead, you bundle up these changes in a file (JSON, plist, SQL dump, whatever) and move it through CloudKit to the other clients as a single asset. The client on the other side can then examine the asset and apply all the contained state after doing just a single CloudKit request to get the asset.

This would fix the perceived performance problem, and you don’t have to fight with the throttling. But you do trade one set of problems for another - the state contained in the asset wouldn’t be reflected in CloudKit state. (At least immediately. I would imagine you can do some kind of “trickle-upload” over a longer period if you want to have them there.) I haven’t examined the NNW data model to suggest how practical this would be.

jaanus commented 3 years ago

Another thought. When people talk about sync, I think it is reasonable to assume that they would like to sync several devices that are under their control and physically close to one another. (Even if not physically close, it could still be e.g that you have a work computer, home computer, and iPhone that you take to both locations. iPhone can then serve as the carrier of sync info.) For such situations, Apple provides Multipeer Connectivity. I think it was originally built for games. I am puzzled by how little utility/productivity apps use it. It works well across all Apple platforms and various network configurations.

This might be relevant for NNW because I imagine the scenario is that you have several of your devices in front of you, and you would just like to have them synced so they all show you the same numbers/content. MC might provide a good platform for this where the devices running NNW can discover each other, and the current state they are in, and then decide what data and how to move based on that.

vincode-io commented 3 years ago

@jaanus Both of these solutions bypass CloudKit's change tracking. They might be used for an initial sync, but wouldn't every sync after that would perform much worse than CKFetchRecordZoneChangesOperation?

jaanus commented 3 years ago

Yes, you would definitely need to figure out how to use either of these solutions together with actual CloudKit records and daily operations. CKFRZCO works really well for ongoing daily use with a reasonable number of changes. If you use the asset or MC, it will mean that some state is expressed as CloudKit records, and some isn’t. Either you work the state back into CloudKit eventually, or you have some other method to resolve that.

brentsimmons commented 3 years ago

We may find ways to speed this up, or make it seem faster — but the upshot is that NetNewsWire is fast and iCloud is slow.