digitalbazaar / jsonld.js

A JSON-LD Processor and API implementation in JavaScript
https://json-ld.org/
Other
1.66k stars 195 forks source link

Test performance of active context clone fix. #414

Open davidlehn opened 3 years ago

davidlehn commented 3 years ago

A scoped empty context bug was found, tests added, and patch proposed: https://github.com/digitalbazaar/jsonld.js/issues/394

The patch was applied here: https://github.com/digitalbazaar/jsonld.js/pull/411

The patch clones the active context. This could in theory lead to performance issues. Tests are needed to check if this is an issue. If so, mitigations should be explored.

grenik commented 3 months ago

I can confirm the performance issue.

We have a relatively large context file (2.6K lines) with dozens of local and type-scoped contexts and hundreds of explicitly unmapped properties (mapped to null or having "@id": null). We use all of these to model our data source which is an OData v4 service with many EDM entities, properties and - more importantly - navigational properties between entities (see $expand in OData). Recently we needed to process several GBs of data instead of MBs and realised that it is not possible in a reasonable time: even after some days we were still at 20%...

The node profiler showed that up to 90% of the time is taken by api.clone:

image

After applying some optimisation to the context, which are acceptable in our use case (removing unmapped properties which can safely be removed, pulling out unique properties to the global context, removing residual empty local contexts etc) I was able to reduce the processing time to an hour, but it is still very slow and api.clone is still taking quite much time

davidlehn commented 3 months ago

It's been a while so I'm forgetting some of the details with that issue. Do you have a simplified example available? Please something that's not GB of data. :-) Maybe just an example of the pattern that causes issues would be fine.

Last time I was looking at related performance issues, I was thinking we should look into one of the immutable data structure libs. I suspect there may be some optimization wins if the data structures are mostly shared where the clone calls are done. I haven't had time to explore that though as our use cases have not hit serious performance issues yet. But always good and fun to make things faster.

At some point I hope to return to the benchmarking code from a long while ago. Having a scalable generated example that follows the pattern causing you issues would be good to add. Other implementations probably have similar issues.

grenik commented 2 months ago

Do you have a simplified example available? Please something that's not GB of data. :-)

Unfortunately, it's company internal so I can't share it

Maybe just an example of the pattern that causes issues would be fine.

All you need to have is a combination of:

Valid real-life examples

Feel free to use them as templates, don't forget to make them large :-)

Type-scoped contexts

Context:

{
  "global property": "global mapping",
  "Type1": {
    "@id": "http://example.com/Type1",
    "@context": {
      "global property": "overridden mapping 1",
    }
  },
  "Type2": {
    "@id": "http://example.com/Type2",
    "@context": {
      "global property": "overridden mapping 1",
    }
  },
}

Data that triggers cloning:

[
  {
    "@type": "Type1",
    "global property": "will be mapped to overridden mapping 1"
  },
  {
    "@type": "Type2",
    "global property": "will be mapped to overridden mapping 2"
  }
]

Property-scoped contexts

I didn't test it, but it should also work

Context:

{
  "global property": "global mapping",
  "property 1": {
    "@id": "http://example.com/property1",
    "@context": {
      "local property 1.1": "http://example.com/property1.1",
    }
  },
  "property 2": {
    "@id": "http://example.com/property2",
    "@context": {
      "local property 2.1": "http://example.com/property2.1",
    }
  }
}

Data that should trigger cloning:

[
  {
    "global property": "global mapping",
    "property 1": {
      "global property": "global mapping",
      "property 1.1": "some value"
    }
  },
  {
    "global property": "global mapping",
    "property 2": {
      "global property": "global mapping",    
      "property 2.1": "some other value"
    }
  }
]

Recursively nested local contexts

The resulting JSON-LD context should simply be large and complex to clone, you can use any available JSON-LD syntax to make it large. This may look artificial, but sometimes people are forced to create really ugly things as workarounds:

{
  "unmapped property": {
    "@id": null,
    "@language": "as the property is not mapped, anything can be here without side effects, e.g. I use @language to leave some comments",
    "@context": {
      "unmapped property": {
        "@id": null,
        "@language": "if a property is mapped and I still want to leave a comment, I cannot use @language anymore as it would change the value's language, so I create a local property-scoped context with an unmapped property (which I usually call @context) and leave a comment in its @language",
        "@context": {
           "unmapped property": {
             // here we go again - it can also have @context
          }
        }
      }
    }
  }
}