editfmah / sharkorm

Shark ORM for iOS/macOS/tvOS/watchOS
http://sharkorm.com
Other
248 stars 39 forks source link

Multiple Threads access #30

Closed Parkyprg closed 7 years ago

Parkyprg commented 8 years ago

In one of my applications I am accessing/updating db objects from multiple threads and for some reason it simply freeze. I am trying to narrow it down, but being multi-threaded, seems to be difficult.

Are there any special steps to make, in order to work with the objects from multiple threads?

editfmah commented 8 years ago

Nope, nothing to do at all.

In fact in the unit tests there are lots of examples of up to 50 threads all reading/writing/deleting at the same time.

There is a critical section around a write op to a database, but these will not block reads. And transactions are partitioned by thread as well, so unless there is a huge op within a large transaction, you should not see, much in the way of blocking.

Any additional information you could provide would be helpful. Is there anything special about what you are persisting? e.g. large BLOB objects, or very wide objects?

Parkyprg commented 8 years ago

It seems I am not able to find a reason, but I suspect is somehow related to the framework (at least maybe to the way I am using it).

As a background - I am sending a few server requests from different threads to fetch some data (using AFNetworking). When the results come, I insert/update them in the database with SharkORM.

The problem is that the app simply freeze (like entering in an endless loop) - but not always at the same step and what is most frustrating - it happens a few times per each 100 runs (or so). If a pause it from xCode, I see different threads, stuck at some SharkORM methods. For example now I have the following: Thread 1 - freezed at SRKQuery whereWithFormat - when it sets "self.whereClause = format;". In the steps I see this object has been initialized from a removeAll method of SRKResultSet. Thread 4 - freezed at SRLObject setPropertyBoolIMP - when it sets a boolean value on a field.

I will try to note down where it happens each time but unfortunately I don't know how I can reproduce it properly.

Any thoughts are highly appreciated.

PS - it's not like a temporary block - it freezes forever, no matter what I do. I am storing only simple values - like NSNumber, NSString, BOOL, double and a few relations between tables.

Is it possible that I could enter in a deadlock or something (like referentiating A from B and also B from A)?

Parkyprg commented 8 years ago

I don't think there is a deadlock - because in that case it would freeze all the time at the same place/moment.

editfmah commented 8 years ago

Hi Sorry for not getting back to you yesterday, I would agree. The deadlocks always show themselves waiting for a semaphore, and we only have the one critical section as far as I can remember around write operations.

You have done quite a lot of investigation already. And we also use AFNetworking in the same kind of way. We modified AFNetworking to always use background threads and not all crashing in on the main thread, but that would be the only difference.

Are you seeing any errors on the delegate? Locked SQL file, index corruption, anything like that?

editfmah commented 8 years ago

Also, i'd never expect a freeze on your example of thread 4. I'd be looking at what priority those dispatch items were, and if they got suspended by GCD for higher priority blocks? But i'm really scraping the barrel for ideas there.

Parkyprg commented 8 years ago

Thank you for your reply. I managed to get it reproduced all the time now. This should make things easier (in theory :)). I have 2 relevant threads, always the same ones: Thread 1: Stuck at [SRKQuery setOrderBy] Thread 5: Stuck at setting a BOOL property. Runs in a concurrent queue.

I am attaching 2 images with what I think it might be relevant.

thread-1 thread-5

editfmah commented 8 years ago

Thanks for the info, it's very weird. As the "setPropertyIMP", is only on a single object, and shares nothing with other objects or the ORM. There is quite literally no reason why it would ever block there.

What created the block and dispatched it? Was it an AFNetworking response block? Could you try dispatching one of those blocks on a different queue:

dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{ // loop through doing your update here. });

I appreciate that this is normally a nightmare to try and break your app apart to make a bit of it async. But might I suggest using an NSLock to hold a semaphore until the dispatched block has completed.

I'm just interested to see if you break out of this dispatch queue, whether or not it starts to work.

It might give us at least a direction to head in.

P.S. does this happen on an entirely repeatable basis, or is it intermittent?

editfmah commented 8 years ago

Hy @Parkyprg,

Is there anything specific you would like me to look at, or a scenario I could setup that you think would be the best possible chance to replicate your issue? I have a couple of free hours today that can be dedicated to trying to work out this issue.

Thanks Adrian

Parkyprg commented 7 years ago

Hi Adrian,

I managed to reproduce the problem in a separate project. Can you tell me an email address where I can send you the source files?

Thank you, Cristi

sharkorm commented 7 years ago

Hi,

devs@db-access.org.

Looking forward to fixing it!

Sent from my iPhone

On 12 Sep 2016, at 05:37, Parkyprg notifications@github.com wrote:

Hi Adrian,

I managed to reproduce the problem in a separate project. Can you tell me an email address where I can send you the source files?

Thank you, Cristi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Parkyprg commented 7 years ago

Email sent. Forgot to mention it here :)

Parkyprg commented 7 years ago

I was wondering if you were able to reproduce the problem with the test project that I sent.

editfmah commented 7 years ago

Just looking now, but so far unable to reproduce. Does this happen on devices or the simulator, or both?

On 14 Sep 2016, at 09:46, Parkyprg notifications@github.com wrote:

I was wondering if you were able to reproduce the problem with the test project that I sent.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/sharksync/sharkorm/issues/30#issuecomment-246946492, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUZ3u34rm9Ezue5Laoy172CjD0cFcNbks5qp7SAgaJpZM4JqlVI.

Parkyprg commented 7 years ago

Mostly on device. Try to add many contacts, so it can spend more time on that particular job.

editfmah commented 7 years ago

Hi, i'm just trying to get it on a phone, but in the simulator we added 1000 contacts to really stretch it.

Parkyprg commented 7 years ago

On simulator I believe the time spent processing the contacts is too short. Try on a device to add some random contacts with the second project that I sent you.

editfmah commented 7 years ago

Okay, I will try that when our testing team get here. I'd prefer not to create loads of addresses on my personal phone. As actually, the app just crashes on iOS 10, as it tries to access the contacts before it asks for permissions. I'm just going to add it in manually now.

UPDATE: The permission for Contacts is not present in iOS 10 (it crashes before it adds the entitlement I think), so I can't manually add it. I'll wait for an iOS 9 device. Will get back to you soon.

editfmah commented 7 years ago

Right, i've managed to re-create the hang (it took a lot of trying), however, in this particular case it was not utilising shark at all, but had been blocked by two threads both creating a runloop for the contacts API at the same time. I will continue looking to see if I can't pin it down further.

screen shot 2016-09-15 at 09 24 39

editfmah commented 7 years ago

Also, what device/iOS version are you using? I'll try and match that this end if I can for a better and more accurate test. I'm currently using an iPhone 6 running iOS 9. 1000 contacts.

Parkyprg commented 7 years ago

I am using the same device - 6 with iOS 9.3.5

editfmah commented 7 years ago

Hi, I'm going to look at this again today. Is there any further information/investigation for this issue?

Parkyprg commented 7 years ago

Thank you for looking into this. However, there are no new information. It still freezes.

I am not 100% sure this is caused by the framework but every time it happens - 2 threads are locked on SharkORM processes.

Parkyprg commented 7 years ago

Hi,

Were you able to find anything? At least did you managed to reproduce it on a test device?

editfmah commented 7 years ago

I've only been testing for about 1/2 an hour, but so far it has not locked up once. When I did manage to get it to freeze last time it happened to not be doing anything in Shark. Another time, it froze whilst on the line.

idx++;

Which, really just has to be symptomatic of the co-ordinating thread locking. Knowledge of how iOS/OSx manages it's thread hierarchy is not known to me, but I would have thought there will be a base level runloop (thread) that controls all the others, but I would need to ask someone on SO who may be more knowledgeable.

Shark, as a general rule does nothing in any background threads, it merely executes it's code in-line with the developers intentions. We do use performSelectorInBackground: if you're performing async queries, but these are the specific differences.

So, although unable so far to identify the culprit, i'm struggling to find any reason to believe it is the ORM. Although I do maintain a completely open mind.

I will do some more testing over the next hour or so. Possibly on some older / single core devices.

Thanks Adrian

lbwxly commented 7 years ago

Any update for this issue? i also get freeze sometimes at following line SRKObject class

this is called from main thread

editfmah commented 7 years ago

Hi,

I looked again recently at this, but in the other instance, the code would stop in the middle of nowhere and in places that could not possibly ever pause. Yours however might be different.

So, a few questions. Does your hang ever continue, or is a transient pause?

Are there other DB operations happening in the background? (they would have to be huge to cause any sort of delay).

Is this call normally within any kind of transaction?

But I will look at that method now to see if there is anything I can spot.

lbwxly commented 7 years ago

it pause at this line every time, we can see the thread is wait for some thing from the stack trace. it is not a transient pause. i think there is not any background access, since it occur when our app just launch

editfmah commented 7 years ago

Is this swift code by any chance? Does it make it to the call to description? Because I think i've just spotted a while that should be an if. I don't suppose you happen to have a screenshot of the trace do you?

lbwxly commented 7 years ago

my app is built with swift. it is not frequent. i will generate a screenshot when it occur next time

editfmah commented 7 years ago

Excellent, I look forward to seeing it. Hopefully, there will be something to go on.

lbwxly commented 7 years ago
screen shot 2016-10-18 at 4 44 26 pm
lbwxly commented 7 years ago

the snapshot

editfmah commented 7 years ago

Thank you for the screenshot, i've been investigating the stop point, although on the face of it that call should hardly cause any kind of issue. So i've been looking around and trying to see how other have gone about debugging this particular condition (seems to happen quite a bit with Unity and couple of other big frameworks).

There seems to be something in common with all of them, and that is on the other threads (and mostly network calls) there is a block dispatched which is executed on the main queue, from a background thread. This is the same as the previous poster too, this has the effect of locking that main thread for some reason.

There is some talk about it being when the blocks inadvertently reference a static variable and there could be an issue when that is pushed onto the stack for that specific block.

If you could get it to pause again, would you mind sending the output from a 'bt all' at the debugger, that may prove useful to try and track this down further.

I will also look to see if there are any SQLite issues open where thread calls can destabilize parent thread locking, or anything of that vein.

lbwxly commented 7 years ago
lbwxly commented 7 years ago

happen again

editfmah commented 7 years ago

Thanks, I think I may have found the bug from the information you provided. I will check it out now and post a fix. By rights it should just crash, but it doesn't. It just hangs instead.

lbwxly commented 7 years ago

so, it seem two thread access systemEntityRelationships at the same time?

editfmah commented 7 years ago

yep, when a static is accessed it seems to block the main thread. I'm just removing the statics and putting them into a class. Those items are a bit of a throwback to some original code when the framework was just c & c++

editfmah commented 7 years ago

Okay, to do this right is going to take some time. ETA on a fix is tomorrow by the looks of it, subject to passing all the tests.

Parkyprg commented 7 years ago

Thank you Adrian. I will try this fix too. Can you release it on pods too?

editfmah commented 7 years ago

Just done it v2.0.9 should be showing, thanks @Parkyprg for being so patient. It was only when the second report came in that I could even see the pattern.

It may not fix it (considering I really struggled to replicate it), but i'd be really surprised if it didn't :)

Parkyprg commented 7 years ago

:) Thank you.

editfmah commented 7 years ago

Is this looking good now? It has caused a performance drop in the ORM, but that will be dealt with separately.

lbwxly commented 7 years ago

i just get the latest code. look good for now. if any issue found, we will let you know. by the way, i have made some changes on my local branch(some changes about printing sql log base on setting, KVO related bug fix and some additional convenient methods), can i send a pull request?

editfmah commented 7 years ago

Yes please do.

lbwxly commented 7 years ago

i think your change may not fix the problem, the problem is two thread access/modify the systemEntityRelationships list, but your change is wrap the getting logic of the member and add lock in the getting method. it won't avoid the access/modification to the list from two threads after they get the list object with a thread safe wrapper.

lbwxly commented 7 years ago

The problem is still existing. i run bt all when it occur, get the same result as before, not sure if the problem is access [Class description] in different thread.

lbwxly commented 7 years ago

i guess the problem is on code [r.sourceClass description]

editfmah commented 7 years ago

Those calls to get the systemEntityRelationships create a copy of that dictionary, so that should make it thread-safe (from an iteration point of view), now there might be two things accessing the objects within the dictionary which will be the same, but, as they are being used RO there should never have been a problem accessing them as a single static either. I only wrapped a lock around them to ensure there were not two copies being made at the same time. I strongly believe this to be a problem with apple's implementation of their block dispatcher, which seems to lock threads when creating/copying stack objects.

I had other plans for how I was going to deal with all these statics, event registry, backing store & re-implement the caching that would largely do away with large swathes of this code, but I was unsure as to weather it will actually be faster (more performant) in the long run. It would enable good memory savings and some CPU gains on some operations, but retrieving a properties value may well become slower, which would be a more consistent performance concern.

As for description, that is simple string manipulation and work on the heap. And there is a heap per thread anyway, so this should never clash.

editfmah commented 7 years ago

Anyhow, I will look again at this. Because those changes made the ORM much slower, so if it is not the static objects, then they should go back or be re-implemented differently as this is one scenario where OO is the worst possible solution for this.