digitalbazaar / jsonld.js

A JSON-LD Processor and API implementation in JavaScript
https://json-ld.org/
Other
1.66k stars 195 forks source link

Allow uncached context - tag: 'static' is hard coded in resolver #500

Closed F-Node-Karlsruhe closed 1 year ago

F-Node-Karlsruhe commented 1 year ago

For a db based document loader it is not possible to do changes to the context after it was first fetched, as it gets cached in the document loader.

The context resolver https://github.com/digitalbazaar/jsonld.js/blob/20e2286d198ce7d376320306f9df3667f48a4544/lib/ContextResolver.js has the tag flag hard coded and set to 'static' what disallows personal configuration (passing a tag has no effect)

this._cacheResolvedContext({key, resolved, tag: 'static'});

Disclaimer: We are aware that a context should not change, at least not frequently. That's why changing the context in the db is only allowed with strict constraints. Nevertheless one should give the user the option to not cache the context within the resolver. Possible workaround: implement own document loader from scratch -> not preferred

Edit: this._cacheResolvedContext({key: url, resolved, tag: remoteDoc.tag}); exists, but has no effect

dlongley commented 1 year ago

The line quoted (which I believe can be seen here: https://github.com/digitalbazaar/jsonld.js/blob/v8.1.0/lib/ContextResolver.js#L70-L76), appears to cache an inline context (not remote) context. The key used in the cache is a JSON stringification of the context object itself, meaning, it is a content-identifier -- and it would change if the context changed.

When a remote context (or really, just a context identified by a URL) is fetched, the code path goes through:

https://github.com/digitalbazaar/jsonld.js/blob/v8.1.0/lib/ContextResolver.js#L45-L51

and then:

https://github.com/digitalbazaar/jsonld.js/blob/v8.1.0/lib/ContextResolver.js#L127

Where the remote document that represents the context may optionally provide a custom tag:

this._cacheResolvedContext({key: url, resolved, tag: remoteDoc.tag});

In other words, if you use a URL to refer to a context (whether you ultimately load the context from the Web, a database, local memory, whatever source you want), then whatever the document loader specifies in the returned remote document will be used when caching. If a context is specified inline, i.e., no URL has been given by which to refer to it, then all we can do is stringify it and use that as the cache key. But, as noted above, if such a context is different in any way from a previously cached context, the cached value will not be used.

Perhaps it is the case that there is some combination of inline and remote contexts that are creating the problem you're seeing, where an inline context refers to a remote context that doesn't use the static tag -- and the cache code isn't smart enough to separate those to allow for changes to that inner context. But that's just a guess.

Could you provide a simple, runnable test case that would allow someone to help confirm and fix the issue?

F-Node-Karlsruhe commented 1 year ago

Thanks for the fast reply :)

I dug a little deeper and found out that the documents do not actually get stored in this.sharedCache but rather in the this.perOpCachewhich is not configurable afaik. and gets called/set every request. So a change during operation has no effect even with a not 'static' tag because the resolved document is cached anyway.

Is there a way to avoid the perOpCache for certain documents or even disable it completely while using the sharedCache only? The trick would be to not provide a url but just the context itself?

No runnable example yet, but i will provide one if the issue turns out more difficult than what i just described.

Edit: the url which is used as a key in the cache does not get changed, just the content.

dlongley commented 1 year ago

Is there a way to avoid the perOpCache for certain documents or even disable it completely while using the sharedCache only?

Hmm, I don't think so. The perOpCache ensures that there's a consistent view of every context during an operation. It would be problematic for contexts to be quantum values that could change each time it was accessed during the same isolated operation. It ensures that, for example, when running a JSON-LD compaction operation, the context values used in the initial and underlying expand transformation will be the same as those used in the following compact transformation -- if the same context URLs are referenced of course. Otherwise, you can get indeterminate, confusing, or unexpected results.

This is very similar to how databases work with properties like "read isolation", ensuring that there's local consistency and dependable invariants for a given query / operation / transaction.

F-Node-Karlsruhe commented 1 year ago

True that. I guess i will need to write a custom workaround because shutting down the whole service in order to get rid of the perOpCache when a context gets altered is not a viable option :D

dlongley commented 1 year ago

@F-Node-Karlsruhe,

What kind of problem is actually caused? It seems like you ought to treat an operation that started prior to the context being updated as one that will also finish before that update is "seen". In other words, in any partitioned system there are going to be these sorts of situations -- why is whatever outcome you're seeing being considered a bug / problem instead of expected behavior?

Once the operation completes and another is run after the context update, that operation sees the update, right?

F-Node-Karlsruhe commented 1 year ago

When adding a property to a context in order to sign a verifiable credential with that context after having signed one before (context was loaded), the new value of the context is not found, even though it is present in the db and served from the respective context url. The signature library only uses the cached version of the context and naturally throws an error as the new context field is unknown. As far as i understand the perOpCache is kept in memory as long as the application runs (a bit confused what per op means). As an effect the only way to get rid of the perOpCache is to restart the service

Edit: this will only be the case in the beginning, when contexts are created, but we still want allow the users to manage their contexts on their own in a restricted way

dlongley commented 1 year ago

As far as i understand the perOpCache is kept in memory as long as the application runs (a bit confused what per op means). As an effect the only way to get rid of the perOpCache is to restart the service.

By default, each time a JSON-LD operation is executed, a new ContextResolver instance is created that has a new perOpCache. After the operation completes, that instance is discarded. So it should be the case that the perOpCache isn't being reused across multiple operations -- if it is, that does sound like a bug that should be addressed.

Could it be some other cache (external to jsonld.js) that is getting in the way? IMO, what you're describing should work with jsonld.js without any changes.

F-Node-Karlsruhe commented 1 year ago

That sounds reasonable. Otherwise perOpCache would have been just the same as the sharedCache

I'm using pretty much the example straight out of the box. It's unclear where exactly the ContextResolver gets invoked or if there are other instances/caches

Signing with digitalbazaar suite

// @ts-ignore
import { issue } from '@digitalbazaar/vc';
// @ts-ignore
import { Ed25519VerificationKey2020 } from '@digitalbazaar/ed25519-verification-key-2020';
// @ts-ignore
import { Ed25519Signature2020 } from '@digitalbazaar/ed25519-signature-2020';

async signVC(credential: any) {

        const keyPair = await Ed25519VerificationKey2020.from(this.secrets);

        let suite = new Ed25519Signature2020({
            verificationMethod: `${this.did}#${this.didDocument.verificationMethod[0].id}`,
            key: keyPair
        }
        );

        const signedVC = await issue({ credential, suite, documentLoader });

        return signedVC;
    }

Document Loader

const documentLoader: Promise<any> = jsonldSignatures.extendContextLoader(async (url: string) => {

    if (url.startsWith(BASE_CONTEXT_URL)) {

        const context: any = Contexts[url];

        if (context !== undefined) {
            return {
                contextUrl: null,
                documentUrl: url,
                document: context,
                tag: 'static'
            };
        }

        // Lookup custom client context in db -> this gets called every time, but still it obviously uses the cached version instead!
        const customContext: any = await MongoDbService.getContext(url.split(BASE_CONTEXT_URL)[1]);

        if (customContext !== undefined) {
            return {
                contextUrl: null,
                documentUrl: url,
                document: buildCredentialContextDocument(customContext),
                tag: undefined // funnily it caches anything that is not undefined
            };
        }
    }

    console.warn(`Fetched @context from ${url}. Use with care!`)

    const document = await fetch(url);

    return {
        contextUrl: null,
        documentUrl: url,
        document: await document.json(),
        tag: 'static'
    };
});

Error

JsonLdError [jsonld.ValidationError]: Safe mode validation error.
details: {
    event: {
      type: [ 'JsonLdEvent' ],
      code: 'invalid property',
      level: 'warning',
      message: 'Dropping property that did not expand into an absolute IRI or keyword.',
      details: { property: 'newProp', expandedProperty: 'newProp' }
    }
  }

Restarting the service resolves the issue

F-Node-Karlsruhe commented 1 year ago

Good news. it appears not be a caching issue even though it actually has to be (?). I tracked the cache's content in the ContextResolver and both behave just as expected. Maybe there is another location where the context might be cached? Nevertheless, the issue remains and i will keep searching the root cause, why it cannot expand the new property in the context.

F-Node-Karlsruhe commented 1 year ago

I found the issue! :) https://github.com/digitalbazaar/jsonld.js/blob/04cdf49b2ed02b4b787a3bdff63dd60118cf2c6c/lib/ContextResolver.js line 76

// context is an object, get/create `ResolvedContext` for it
      const key = JSON.stringify(ctx);
      let resolved = this._get(key);
      if(!resolved) {
        // create a new static `ResolvedContext` and cache it
        resolved = new ResolvedContext({document: ctx});
        this._cacheResolvedContext({key, resolved, tag: 'static'});
      }

When not in cache (_get returns undefined) it creates a new document and caches it with tag: 'static' for some reason. replacing 'static' with undefined solves my issue. But that ain't a fix. is the code faulty or is there a reason it caches static in this case?

Edit: that also explains why i did not find anything when investigating the cache. i only looked into the _get method which comes before the suspicious line

dlongley commented 1 year ago

So it sounds like we might be back to where we started -- where my guess as to the problem was:

Perhaps it is the case that there is some combination of inline and remote contexts that are creating the problem you're seeing, where an inline context refers to a remote context that doesn't use the static tag -- and the cache code isn't smart enough to separate those to allow for changes to that inner context. But that's just a guess.

Can you confirm that the context that is causing trouble references other contexts in it by URL?

If so, ideally a simple test case can be constructed with an inline context that references another context by URL and a document loader that will load that context but change it the second time it is fetched. Then if two operations are run using the same inline context, the test should ensure that the output is different (in the appropriate way).

A simple fix would be to only treat inline contexts as static per operation (i.e., set tag to undefined), but, of course, that would disable shared cache reuse for inline contexts. A better fix would only mark such contexts static when they only had static dependencies (or no dependencies at all). I'm not sure how much benefit we're getting from treating inline contexts as always static right now -- but it could be that it is significant.

cc: @davidlehn

F-Node-Karlsruhe commented 1 year ago

I can confirm that. It both is composed of an array where the first context is referenced by URL and it uses @import within the context, where the imported context is referenced by URL as well.

If so, ideally a simple test case can be constructed with an inline context that references another context by URL and a document loader that will load that context but change it the second time it is fetched. Then if two operations are run using the same inline context, the test should ensure that the output is different (in the appropriate way).

This precisely describes my current implementation in this case. I will check the fixes tomorrow in order to confirm the finding :)

Edit: i just realized that these fixes are rather on the library side. Any short term advice for fixes/improvements on my side?

F-Node-Karlsruhe commented 1 year ago

I fixed it as you proposed by not giving an url and using the stringified version of the context in cache. Thank you :)

F-Node-Karlsruhe commented 1 year ago

Bad news. Setting documentUrl: url to null or undefined did not fix the issue. I thought it was fixed because the line tag: 'static' was still set to undefined in my local jsonld library, my initial "fix"

I found the issue! :) https://github.com/digitalbazaar/jsonld.js/blob/04cdf49b2ed02b4b787a3bdff63dd60118cf2c6c/lib/ContextResolver.js line 76

// context is an object, get/create `ResolvedContext` for it
      const key = JSON.stringify(ctx);
      let resolved = this._get(key);
      if(!resolved) {
        // create a new static `ResolvedContext` and cache it
        resolved = new ResolvedContext({document: ctx});
        this._cacheResolvedContext({key, resolved, tag: 'static'});
      }
F-Node-Karlsruhe commented 1 year ago

More specific issue https://github.com/digitalbazaar/jsonld.js/issues/537