decentralized-identity / ethr-did-resolver

DID resolver for Ethereum Addresses with support for key management
Apache License 2.0
210 stars 76 forks source link

[proposal] fetch ERC1056 events till a specific block #150

Closed nichonien closed 2 years ago

nichonien commented 2 years ago

Is your feature request related to a problem? Please describe. No

Describe the solution you'd like Reading / Caching of ERC1056 events should be possible from the latest block to some older block. For a cache based use-case, the solution would already store the events from the ERC1056 contracts. Whenever a user requests a DID-Doc, the resolver should sync / fetch events from the last synced block to latest block.

This makes the changeLog to have a property something like stopBlockNumber. While executing wrapDidDocument the changeLog could then possibly pull events between the latest block and specified stopBlockNumber. Thus if there is no specified stopBlockNumber, then it pull all the events from the blockchain.

Describe alternatives you've considered NA

Additional context NA

mirceanis commented 2 years ago

What would be the benefit of this mechanism?

Going from latest down to some stop block means that the resolver might miss events from the earlier versions of that DID doc.

Events are gathered in reverse order, from latest to first, but then they MUST all be processed, starting with the first.

It is possible to process only a subset of events, from the first up to a certain block, to see earlier versions of the DID document, but it is never ok to skip the first events.

I am guessing that I'm misunderstanding your proposal. Can you share more thoughts on this?

nichonien commented 2 years ago

This requirement is purely for cache based solution. The cache already has first of the events and whenever there is a need of creating a DID-Doc, the resolver needs to provide only those events that are not synced with cache.

So in the background it is always the case that we process all the events from first to latest, but at the same time we want to reduce the blockchain interaction for efficient response.

For example : did is some random identifier and solution / product does caching of ERC1056 events.

  1. Cache has stored events for did from first to block15
  2. User requests DID-Doc, the resolver reads events from block16 to latest.
  3. The solution process all the events and wraps it to the DID-Doc.
mirceanis commented 2 years ago

This makes a lot of sense. I guess I missed the word cache in your initial proposal 😅

So the resolver would keep a cache of events already fetched for a particular DID or address, and when these events are needed again, they are returned from the cache instead of the blockchain, while the latest events for that DID are still fetched from the blockchain.

This sounds very reasonable, and since events are immutable, there is no need to clear the cache, other than memory pressure.

Would you like to contribute an implementation of this optimization?

nichonien commented 2 years ago

@mirceanis Yeah, sure.

What do you think in terms of the approach for implementation? should there be a dedicated function or the changes could be accommodated in the current Implementation itself.

Making the changes in the current implemented functions could be a breaking change and people using it might need to make changes to their client code as well or maybe we introduce optional parameters which should work with old code?

mirceanis commented 2 years ago

I imagine that this should be transparent to the user and not require any high level API changes. It should be the resolver that builds the cache internally, otherwise it is harder to trust, don't you think?

There can be internal changes to the structure of the methods, as long as the spec is respected and the outer API is kept intact.

The way I'm picturing it is that as the resolver is used, it holds a bunch of events in memory, that correspond to particular DIDs or addresses. If it is easier to implement, then maybe instead of events, it holds results of getLogs(), so that whenever a call to getLogs is made for data that is already cached, the cached result is returned instead of a round-trip to the blockchain.

this line would change to something like cachedGetLogs(chainId, registry, address, fromBlock, toBlock) and this new method would either fetch results from cache if they exist or call provider.getLogs() and cache the result if successful. The cache key must be based on all those inputs to prevent leakage between networks, and some care must be taken when fromBlock and toBlock are not numbers.

This way there is no need to fumble with block numbers that either are or aren't cached, and no other resolver code needs to change.

What do you think?

nichonien commented 2 years ago

I agree with you, the resolver should build the cache internally in order to make it more transparent and trustful.

But the way I see it is that the organisation could use their own cache, given that most of these organisations have already inbuilt caching with solution architecture. This also doesn't burden resolver with configuration and functionalities specific to cache creation and maintenance. In terms of Trust, I believe anything that doesn't directly comes from the blockchain is already out of the scope. Now it is upto the organisations how they build trust among end users, either it could be a public notification or via terms and condition.

Though I think it would be wise to take a survey or opinions of people working in SSI space if we plan to keep caching internal to resolver.

Do you think keeping cache external could be a possible option or it shouldn't be considered at all?

mirceanis commented 2 years ago

Perhaps it could be, but can you describe what you mean by external?

Is it different from my proposal to eventually be able to bootstrap the cache during resolver initialization with entries collected in a previous resolver session? Edit: I just realized that I deleted this from my previous reply to emphasize internal cache use.

I'm trying to understand "external" here because it seems counter to the purpose of did:ethr to have organizations governing a cache. The whole point of DIDs is to not require trust between organizations and their users. I fear that I'm misunderstanding your proposal again.

nichonien commented 2 years ago

The whole point of DIDs is to not require trust between organizations and their users I completely agree with you on this and have quoted it earlier as well I agree with you, the resolver should build the cache internally in order to make it more transparent and trustful.

The two main reason for having "external" cache are :

  1. The caching could be a possible requirement for certain use-case or businesses, so I see it as an option.
  2. If did:ethr resolver has internal caching, organisations would be bound to manage different cache for different purposes.

You understood the proposal right, we are just brainstorming for an optimal approach.

mirceanis commented 2 years ago

I don't disagree with caching. For this specific use-case it makes a lot of sense, since the logs that would be cached are immutable anyway.

I'm just trying to understand the mechanism you have in mind for "external" caching.

And to clarify, what I had in mind for internal cache was a map of logs that is kept in memory. There is nothing to manage other than maybe setting a maximum capacity? I imagine that this map could also be exported and stored for future instantiations of the resolver, but I still don't get where organizations would need to step in to "manage" the cache.

nichonien commented 2 years ago

Let's say an organisation is using Redis for caching and the server syncs the events from contract to cache. Now when a request for resolving DID comes in, the resolver receives the request with an "additional" parameter. This parameter is event logs in some format and maybe some metadata like, last synced block and more.

Now the resolver takes the passed events object and resolves the did.

So the resolve api could be something like resolve( did, parsed, _unused, options, events ? : some format) and then changeLog would also accommodate this change as changeLog( identity, networkId, blockTag, stopBlock? : optional).

The cached events is optional, thus the legacy code isn't impacted and anyone who wants to use the cache functionality could make changes to the API call and add the required params.

mirceanis commented 2 years ago

Thanks for the example, I have a better idea about what you're expecting now.

In your example, it seems that an organisation is doing the DID resolution for the user, since they are the ones providing the extra events from their Redis infra. This is counter to the idea of DIDs, but let's entertain the notion a bit.

In this is the case, wouldn't it be easier to create an ethereum JSON-RPC provider that can cache calls to eth_getLogs using Redis? This way the resolver API does not have to change to accommodate this special case. The advantage of this is that other calls can also be cached, or optimized for the ERC1056 contract instance deployed or used by this organization.

There is another reason why the resolver API should not be changed like this, which is that this resolver is not designed to be used independently, but rather be paired with the did-resolver library, along with other resolvers. The API changes that you propose would then need to be bubbled up to that library.

What do you think?

nichonien commented 2 years ago

Hmm, JSON-RPC provider looks like a better approach.

So the resolver would be initialised with this this JSON-RPC provider instance and then it would consume the cached events during resolve API call? I'm just confirming If I understood this approach correctly.

mirceanis commented 2 years ago

Yes, that's the idea. The resolver would not care that the provider is caching events or not.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.