gonzalezreal / Groot

From JSON to Core Data and back.
Other
534 stars 61 forks source link

Fixes an issue that caused attributes to be updated even when their values are equal #50

Closed Rogiel closed 9 years ago

Rogiel commented 9 years ago

This fixes an issue that caused CoreData to issue a update query to every object fetched by Groot even if their values have not changed since the last fetch.

gonzalezreal commented 9 years ago

Can you elaborate and be more specific on the use case you're trying to fix? It seems there's some performance loss, specially when serializing relationships, and I would like to know if it is justified.

Rogiel commented 9 years ago

If you call setValue(:forKey:) on a NSManagedObject it will cause the object to become "dirty" and a call to NSManagedObjectContext.save() will cause all the entities to be store in the database, even if no changes have actually happened.

Heres an example in Swift:

let dict = [
    "eventid": 1,
    "name": "Test event"
]
try GRTJSONSerialization.objectWithEntityName("Event", fromJSONDictionary: dict, inContext: managedObjectContext)

// first try
NSLog("has changes? %@", (managedObjectContext.hasChanges ? "Yes" : "No"))
try managedObjectContext.save()
NSLog("has changes? %@", (managedObjectContext.hasChanges ? "Yes" : "No"))

// second try, using the same JSON dictionary
try GRTJSONSerialization.objectWithEntityName("Event", fromJSONDictionary: dict, inContext: managedObjectContext)
NSLog("has changes? %@", (managedObjectContext.hasChanges ? "Yes" : "No"))
try managedObjectContext.save()
NSLog("has changes? %@", (managedObjectContext.hasChanges ? "Yes" : "No"))

Here's the output without the checks:

has changes? Yes
has changes? No
has changes? Yes
has changes? No

As you can see, for the same JSON dictionary a double save is issued. In fact, this could be considered as an optimisation because without it, if importing a relatively large JSON file could trigger a save to several objects (if not all).

Now, with the checks I have proposed, the output is as expected:

has changes? Yes
has changes? No
has changes? No
has changes? No

Only a single save operation is performed for the JSON!

Further debugging on the issue can be seen by listening on the NSManagedObjectContextDidSaveNotification in the notification's userInfo["changes"].

EDIT: Btw, sorry. Now that I read my pull request I realised how bad it was, no explanation on the issue at all.

gonzalezreal commented 9 years ago

I'm still concerned about the performance. In the relationship case, whole NSSets are being compared with isEqual. In all of my use cases I won't receive the same data more than once. Does it make sense to put that logic in the managed object subclass?

Rogiel commented 9 years ago

That is a possibility, but I still see this as a bug: a KVO notification is still sent for every single field on the JSON, had it been changed or not. I do agree that in the case of a large Set this could be an issue however I believe forcing the entire tree of objects to be persisted unnecessarily to the database looks even worse.

IMO the correct behaviour is to perform the check: if a value has not changed, a change notification should not be dispatched or a save operation performed. But since there could be a large NSSet that is changing most of the time, I believe it could still offer a solution that bypasses the check in the relationship case with an extra boolean userInfo key named "JSONIgnoreEqualityCheck". This way it is possible to offer the correct behaviour most will expect with the possibility of optimization, if required.

What do you think of this solution?

gonzalezreal commented 9 years ago

I still think that the equality check is not necessary for most of the use cases and adding an extra user info key to bypass it is not a good solution.

What about the cost of comparing transformable or binary attributes for equality? This could potentially have a lot of impact in performance.

If you still want to go that way and implement it inside Groot, one possible solution could be creating stricter versions (using the equality check) of the uniquing serialization strategies and let the user opt-in to use them. It does not make sense to do the equality check in the insertion-only serialization strategy.

In case you were not aware of the serialization strategies: Groot uses different serialization strategies based on the presence of identityAttributes and on the number of identity attributes defined. This was implemented with performance in mind: insertion-only is faster than uniquing, which is faster than composite uniquing.

Anyway, I still believe that the equality check should be done (when needed) in the NSManagedObject subclass by overriding the property setters. Other libraries providing a similar functionality are not doing it.

gonzalezreal commented 9 years ago

Feel free to re-open if you want to discuss any other solution.