drewmccormack / ensembles

A synchronization framework for Core Data.
MIT License
1.63k stars 132 forks source link

Ensembles slowing down device dramatically #278

Open emilebennett opened 5 years ago

emilebennett commented 5 years ago

We’re just troubleshooting a little problem at the moment and I thought see if I can get some help here on Issues.

So, we’ve been getting lots of reports over the past few month of our app (which uses Ensembles) slowing down dramatically for long term users, to the point where the main thread is blocked for minutes on end. We’ve been looking into it and have discovered that as soon as iCloud sync (Ensembles) is turned off, things speed right up again.

We set up a data generator that creates a a years worth of data, forcing a sync after each piece of data creation. This mimics well how long term users would be using the app.

Here’s what we’ve found: After running the data importer and relaunching the app, you get endless printouts of [CDERevisionManager CheckModelVersionOfStoredModificationEvents] and a big pause with the UI slow. At this point our sync indicator is up and takes ages to complete sync - up to 5 minutes.

This is the behaviour for a few runs of the app, with sync taking a long time to complete. then it miraculously ‘sorts itself out’, sometimes after leaving the app till the next day, other times just leaving it open for a good 10 minutes or so, sync is super quick again, and all is good again.

Users are never getting to the point where Ensembles 'sorts itself out' as they're not going to sit with the app open and frozen for 10 minutes, so they're just abandoning the app, which isn't great!

We’re having trouble finding the source or reason for this happening. Knowing Ensembles as you do, is anything I’m describing here ringing any bells for you?

Cheers!

drewmccormack commented 5 years ago

I did reply to both the emails you sent to me personally, but it seems they didn't get to you for some reason. Here is the original reply:

Ensembles should never do anything extensive on the main thread. I guess it could be busy in the background, causing general slow down, but is should not block the main thread.

Can you explain what the test you do is exactly? Are you importing a years worth of data in one hit, and then turning on sync (ie leeching)? I’m not clear exactly what you mean by “after running the data importer and relaunching the app”.

Something you can try is using the verbose logging (SetCurrentLogLevel) to see what Ensembles is actually doing. Maybe there are clues there. You can also send me the log. Perhaps I can see something.

emilebennett commented 5 years ago

Hey Drew, thanks for getting back to me on here...dunno what happened with the emails!

For the test, we're simulating a years worth of data. So, an expense is added, sync is kicked off. When sync completes we add another expense. We do this around 1000 times to create a data set and change table that resembles a years worth of use. I'll run the data generator again now with Verbose logging on, and see if I can recreate the issue. It'll take a few hours but I'll post up the results as soon as I have them!

Cheers

drewmccormack commented 5 years ago

So you do 1000 saves? That will indeed take a long time to "catch up". It has to run through each save, check their dependencies etc.

In general, Ensembles avoids this scenario by rebasing. At about 200 saves, it compacts old data, to avoid having to deal with too many save events.

(Rebasing is a feature of Ensembles 2 only).

emilebennett commented 5 years ago

I agree it’s a lot to do at once, however this is an issue that is being reported very frequently - essentially ensembles locks up the users app and it’s unusable. Is it possible the rebase isn’t occurring for people? And is there a reason it could accidentally occur on the main thread, thus locking it? Just trying to rule out all possibilities. And do you have any thoughts on the [CDERevisionManager CheckModelVersionOfStoredModificationEvents] error? This occurs when the app is in ‘slow down’, an error for each database entry, and doesn’t occur one the app is running at proper speed.

-- Latest news Be sure to check out my multi-award winning app Pennies [http://www.getpennies.com/] - Beautifully simple money tracking app. -- Emile Bennett Product Design & Technology Consultant -- Brighton, UK — +44 (0) 7782 273 994 www.emilebennett.com [http://www.emilebennett.com/] -- LinkedIn [http://www.linkedin.com/in/emilebennett]

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.8.421&pv=10.13.6&source=email_footer_2] On Tue, Sep 4, 2018 at 13:10, Drew McCormack notifications@github.com wrote: So you do 1000 saves? That will indeed take a long time to "catch up". It has to run through each save, check their dependencies etc.

In general, Ensembles avoids this scenario by rebasing. At about 200 saves, it compacts old data, to avoid having to deal with too many save events.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [https://github.com/drewmccormack/ensembles/issues/278#issuecomment-418327917] , or mute the thread [https://github.com/notifications/unsubscribe-auth/ABHYV3ohp2oNfvdKpLkoJFkCceiYfnFuks5uXl-jgaJpZM4WYr_b] .

emilebennett commented 5 years ago

I should add, this is an issue being reported by users who've been using Pennies for over a year, but also just a couple of months. SO, really anyone who is using it regularly, maybe 5-10 entries per day

drewmccormack commented 5 years ago

Are you using E2? Are you perhaps suppressing the rebasing? (That is one of the merge options.)

Note that just doing 1000 saves in a loop is not representative of standard usage. The rebasing takes place during merges, so to do a reasonable test, you would do maybe a few saves, then a merge, then some more saves, then a merge, etc. If you do that, you are closer to real behavior, and E2 should get a chance to do the rebases.

drewmccormack commented 5 years ago

I think the model check error is normal. It should recover from that. I believe it is probably generated as it goes through all the save events.

I think the bigger question is why there is no rebasing. Is this an issue you have seen in the past? I thought you had been using the framework for quite a while.

emilebennett commented 5 years ago

We're actually pulling from ensembles-next branch, which should be E2? and we're using schema version 2...

screen shot 2018-09-04 at 14 05 53
emilebennett commented 5 years ago

We've been seeing this problem for around a year, but recently I've been getting a lot more reports of it. We'd thought maybe it had something to do with people migrating over from previous versions of pennies (and previous model schema) but we've been on this latest schema for long enough now that the problem is clearly not that.

Only the past few weeks the penny dropped that we were looking for the problem in the wrong place - we thought it was our internal calculations that were slowing things down, but could never reproduce the problem. I then got a report from a user that he turned off iCloud sync (therefor ensembles) and the problem instantly went away. From this I was able to reproduce the problem myself. But, as I say, if I leave the device for a good 5-10 minutes when it's locked up, it does sort itself out.

So, perhaps it is something to do with rebasing. Could it be that the rebase isn't being triggered, and when it finally does run the dataset is so big that the rebase takes a long time and locks up the app?

drewmccormack commented 5 years ago

It's possible, but unless you are deliberately suppressing it, I don't know why the rebase would not run.

Anyway, it is largely speculation. Best to generate the issue, and run with the verbose logging on. You can even make a build for beta testers, and have the log go to file (there is a function in the log header to set the destination), and then email the file to you guys. Would tell a lot.

drewmccormack commented 5 years ago

One other thing to consider is just that you have a lot of data, and that leeching a new device is taking a long time. How many rows of data do these users have in their Core Data database? When you leech, ensembles has to merge two databases, and that can be expensive. Each client device has to repeat this, because it is a decentralized sync.

emilebennett commented 5 years ago

It's not to do with them leeching to a new device, we handle that separately with a separate UI and progress.

The users I've been talking with only use one device, they just use iCloud to backup. Really it's not a lot of data, a few of months of a few entries a day in a budgeting app, surely that should be handled comfortably?

I'll get a log while the problem is persisting. Hopefully looking at that we can see a bit more whats happening...

drewmccormack commented 5 years ago

Something else I can think of that could cause a deadlock on the main thread are extensions. Do you have any? Do any use Ensembles data?

emilebennett commented 5 years ago

Hey Drew, sorry for my delayed reply! We do have have a Today and Watch extension, but they are read only

drewmccormack commented 5 years ago

I'm just thinking of file coordinators and file presenters. Those things can lock up pretty easily.

emilebennett commented 5 years ago

I've got some log files for you Drew. There's 3 logs, numbered in the order they occurred, and labelled with what was happening in each log. As I've explained, the data generator adds a years worth of expenses, one at a time, syncing after each addition (and waiting for completion before adding the next item) just as it would running in the real world. Once this process is complete I then quit the app and launched it again, copying out the logs.

It would be great if you could scan over then and let me know if you see anything in there that jumps out at you as being problematic!

Cheers Archive.zip

drewmccormack commented 5 years ago

OK, it is pretty clear from the logs that the time is mostly spent on rebasing, and I think I understand why the test takes so long, and why this should not be an issue for your customers.

A rebase can only happen on the first merge that an ensemble object carries out. This means that if you do 1000 changes, with saves, and merges, there will be no rebasing, because only the first merge is considered for rebasing.

When you relaunch the app, the ensemble object is new, and at that point, it will rebase, which is exactly what we see.

So a rebase does take several minutes, with a years worth of data, but this should not generally happen for your customers, because the app will quit and launch probably daily, if not more. This means that rebases will happen more often than once in 1000 saves.

To simulate this in a test would involve doing about 200 saves, then launching, then doing another 200 saves, then launching etc.

If you do that, I think each rebase will be much shorter, and less noticeable.

Drew

On 5 Sep 2018, at 13:36, emilebennett notifications@github.com wrote:

I've got some log files for you Drew. There's 3 logs, numbered in the order they occurred, and labelled with what was happening in each log. As I've explained, the data generator adds a years worth of expenses, one at a time, syncing after each addition (and waiting for completion before adding the next item) just as it would running in the real world. Once this process is complete I then quit the app and launched it again, copying out the logs.

It would be great if you could scan over then and let me know if you see anything in there that jumps out at you as being problematic!

Cheers Archive.zip https://github.com/drewmccormack/ensembles/files/2352566/Archive.zip — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/drewmccormack/ensembles/issues/278#issuecomment-418698805, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEuACf8oKI94kJWRaLhp5aY7fqyQ2lfks5uX7c0gaJpZM4WYr_b.

emilebennett commented 5 years ago

That makes sense Drew. However, it is happening for some users, so there's a case where the rebasing isn't happening. Perhaps they never quit the app, just background/foreground, for a few weeks/months, so the rebase at around 200 entries doesn't occur. At somepoint the app quits, a rebase is triggered on the next launch, but it's for a much larger change set, so the slow down is in fact caused by the app locking when it finally attempts to rebase.

I imagine what happens then is the user backgrounds or quits the app before the rebase has completed (because it feels 'locked up' or slow') and never actually gets to complete the rebase...so the app remains slow.

Does this sound like a reasonable explanation?

If so, would you recommend a strategy for manually forcing a rebase every x changes, or some system to make sure that the change set doesn't get so large a rebase would cause such a perceivable slowdown of the app.

Cheers

drewmccormack commented 5 years ago

It does seem a bit unlikely to me that someone could go months without ever restarting the app, but I guess it is possible. Still, it may be wrong to assume what you are seeing with rebases is the same issue that your customers are seeing.

In any case, you can indeed force rebases, and it should not hurt. It is one of the merge options (CDEMergeOptionsForceRebase). Just pass it into this:

So a strategy could be to store the last forced rebase date in the NSUserDefaults. If the time elapsed is more than a month, you would do a merge forcing the rebase, and then store the new rebase date.

An alternative would be to count saves, and rebase every x saves. But I think the date option would work best.

Drew

On 5 Sep 2018, at 15:03, emilebennett notifications@github.com wrote:

That makes sense Drew. However, it is happening for some users, so there's a case where the rebasing isn't happening. Perhaps they never quit the app, just background/foreground, for a few weeks/months, so the rebase at around 200 entries doesn't occur. At somepoint the app quits, a rebase is triggered on the next launch, but it's for a much larger change set, so the slow down is in fact caused by the app locking when it finally attempts to rebase.

I imagine what happens then is the user backgrounds or quits the app before the rebase has completed (because it feels 'locked up' or slow') and never actually gets to complete the rebase...so the app remains slow.

Does this sound like a reasonable explanation?

If so, would you recommend a strategy for manually forcing a rebase every x changes, or some system to make sure that the change set doesn't get so large a rebase would cause such a perceivable slowdown of the app.

Cheers

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/drewmccormack/ensembles/issues/278#issuecomment-418722388, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEuAGyVs2mRdhoAZsZZ5hDPyYPs7wCiks5uX8ujgaJpZM4WYr_b.