Crash on ListMonitor - Index out of range

27deabril commented 6 years ago

Please have a look at the attached screenshot with the error. I would guess the error relates to the highly asynchronous nature of our app, so the objects are actually changing in one thread while another thread transitions between the guard and the actual array access in the return.

I have added a couple of prints to try to get some additional info, but they haven't been helpful in that specific case, as it seems the number of objects changed between the last print and the return. According to the variables panel, in that case we are trying to access itemIndex 29. According to the printing description of the sectionObjects variable, there were only 26 objects at the time (although the other prints show the number was previously checked as 31).

We don't have a definite way of reproducing it, but this crash has been happening here about once a day in production. In development, we noticed it often happens when we run the app in a device or simulator which hasn't been used in a few days. This could make sense, as after some days without opening the app, it would be natural, in our case, to have a set of new objects to be loaded, and maybe it is exactly this parallel loading of items in different threads which results in the crash. In fact, when the problem happened just now and I took this screenshot, I was able to see there were two threads in that same area of the code.

We are using 4.2.3, but I don't think this code has changed in 5+.

I don't believe synchronizing the whole code to avoid these situations would be in your plans, and I'm also aware it could turn into a problematic solution. But do you think there could be some different approach in the code to try to avoid this? I am not 100% on whether it would actually behave differently, but I was wondering if maybe writing the same code inline, instead of using the guard, could remove this gap. Something like:

return (itemIndex >= 0 && itemIndex < section.numberOfObjects) ? ObjectType.cs_fromRaw(object: section.objects![itemIndex] as! NSManagedObject) : nil

We will be experimenting in the next few days with the inline code, so I can soon follow up with a feedback on whether the crash is still happening or not, in case this would be an acceptable solution.

27deabril commented 6 years ago

As I had mentioned, we have been testing with the inline version of the code. In these last 3 days, we got errors in that area of code only twice. Neither of them were "index out of range" errors, but instead SIGABRT ("pointer being freed was not allocated..."), as in the screenshot.

Also important to notice is that these errors happened in cases that were a bit more "extreme" - in both occurrences, the device had more than 600 objects in the list at the time (as in this new screenshot). Previously, it happened for very small amounts as well (in the first screenshot, there were only 26 objects, for example).

Not being a 100% effective fix, it is at least looking like a more solid approach for our specific scenario.

Please let me know if you would like any additional info.

JohnEstropia commented 6 years ago

In fact, when the problem happened just now and I took this screenshot, I was able to see there were two threads in that same area of the code.

You already found the issue in your code. ListMonitors are designed only to be used on the queue/thread of the DataStack it was created from, specifically the main thread. You have to take care that accessors of the ListMonitor follow this practice, including any observers.

I don't believe synchronizing the whole code to avoid these situations would be in your plans, and I'm also aware it could turn into a problematic solution.

ListMonitors already do, you can verify this that all notifications and events spawned from them are in the main thread. In fact, NSFetchedResultsControllers, which ListMonitors use behind the scenes already manages most of the synchronizations.

I was wondering if maybe writing the same code inline, instead of using the guard, could remove this gap.

If your app behaves differently when doing those two methods, you have a serious threading issue in your app. I really encourage you to prevent non-main thread accesses in your code. In fact if more people report misuse like this I might just shove an assert(Thread.isMainThread) in those accessors.

Neither of them were "index out of range" errors, but instead SIGABRT ("pointer being freed was not allocated..."), as in the screenshot.

Because your issue is not wrong indexes, it's wrong timing accessing does indexes. Try to manage your threading first, then let me know of your observations.

JohnEstropia / CoreStore

Crash on ListMonitor - Index out of range #259