eclipse-sirius / sirius-emf-json

JSON-based EMF Resource implementation - part of Eclipse Sirius
https://eclipse.dev/sirius/sirius-web.html
Eclipse Public License 2.0
5 stars 10 forks source link

[PERFO] Handling of cross-document refs has an important impact on performance #19

Open cbrun opened 1 year ago

cbrun commented 1 year ago

From a decent size model, which have no inter document cross ref but quite a few internal references. EObjects: 3894 References with values: 3121

And using: String json = JsonResourceImpl.toJson(r, Collections.EMPTY_MAP);

Serialization of my model takes 77606 ms (77 secs) which seems a lot compared to serialization/deserialization using BinaryResource which takes about 400ms.

String size:7126813 time: 77606 ms.

After profiling in tracing mode, in my case it all comes down to : org.eclipse.sirius.emfjson.utils.GsonEObjectSerializer.docKindMany(EObject, EReference) and the following code which, I assume, try to detect whether there is a cross document reference to serialize:

        Iterator<? extends InternalEObject> it = internalEList.iterator();
        while (referenceType != GsonEObjectSerializer.SKIP && referenceType != GsonEObjectSerializer.CROSS_DOC && it.hasNext()) {
            InternalEObject internalEObject = it.next();
            if (internalEObject.eIsProxy()) {
                referenceType = GsonEObjectSerializer.CROSS_DOC;
            } else {
                Resource resource = internalEObject.eResource();
                if (resource != this.helper.getResource() && resource != null) {
                    referenceType = GsonEObjectSerializer.CROSS_DOC;
                }
            }
        }

brute force approach of commenting out this code (which should not be used in my case) lead to

 String size:7126813 time: 746 ms.

which is 100 times faster, and give the exact same result.

cbrun commented 1 year ago

I guess some logic in org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl.EObjectOutputStream.saveEObject(InternalEObject, Check) could be reused as it seems to handle the cross document references case correctly, while keeping good performances.

cbrun commented 1 year ago

After a slightly deeper analysis: the calling code resolve the reference list value, and iterates over it, and the submethod docKindMany(EObject, EReference) will do that again, leading to n*n complexity with n being the size of my list.

pcdavid commented 2 weeks ago

What's even stranger is that not only docKindMany(EObject, EReference) is invoked on every turn in the loop, but its result does not even depend on the current iteration value!

serializeMultipleNonContainmentEReference is using the result of docKindMany(EObject, EReference) to decide how to serialize the reference towards each value, but for a given reference, it will make the same choice for every value, whether it is internal or not.

I'll propose a PR to clean this up.