Implementation ("porting") parity between different implementations

veikkoeeva commented 3 years ago

Do we need or want to maintain an implementation parity between different language versions?

This comes from a discussion at https://github.com/decentralized-identity/did-common-dotnet/issues/2#issuecomment-782704336. In this particular case do we need a regex borrowed from a Java implementation if it were possible to use Uri?

Maybe one issue to consider is security implications of regexes, maybe there are cases on maintainability and developer familiriaty.

I do think we should provide simple functions to check validity of varioud DID constructors, such as DIDs, and they may make use of regexes where warranted (like checking allowed range of characters as per spec if it's the best way). The functions could perhaps return ValidationResult objects and so where making sense, a function like CheckAll could be provided that combines other checks. This likely warrants an issue for discussion on its own. This idea may preclude idea on inheritance hierarchies, https://github.com/decentralized-identity/did-common-dotnet/pull/4 moved away from those, though.

juanfranblanco commented 3 years ago

I don't believe straight ports are required, but ideally overall structure of implementations so documentation can be reused.

Parity between implementations will be ideal in some scenarios might be key in this earlier stages as it will validate test cases and will be easier to correct areas as the spec grows.

.Net is in a losing end here (as usual) as there is lots to catch up to have a complete implementation. If things are architected correctly later on an easy replacement can happen when is necessary.

Regarding the Uri. The implementation of Regex is not from the Java implementation as they use abnf format. System.Uri is a complete different set and may bring other issues in the future. Hijacking a class that is not meant to do this specifc work it might a waste of time as the spec may end up changing. In the future that can be discussed.

Inheritance might be necessary as we might have seen on Format or it will required for the ExtendedAttributes, as usual pick your best pattern as needed.

veikkoeeva commented 3 years ago

Well, as I wrote there

DID syntax specification tells it conforms to RFC3986 (see at https://tools.ietf.org/html/rfc3986). So one idea could be to use .NET Uri to parse the examples like from https://www.w3.org/TR/did-core/#example-7
var uri = new Uri("did:example:123?service=agent&relativeRef=/credentials#degree");
var degreeFragment = uri.Fragment; //Should contain "#degree".

Is this not the case for Uri? I do not know why Java implementation use Regex derived from ABNF and not something similar to Uri. If the reason is that Uri is restricted further in DID specification, one option then is to layer those restrictions on top of Uri. This not a hill I need to die on, but I feel we need to quantify the case in conrete terms and it appears this leaks to the discussion to port more directly or not. That is to say

System.Uri is a complete different set and may bring other issues in the future.

does not look "a complete different set" to me and it looks to me regexes bring potential security and performance issues on that maybe need recording if nothing else.

This is a more general point to .NET and not to Uri specifically: I'd rather read the specification and implement it and test it can (de)serialize documents provided by others rather than to read other implementation with idea of porting them. It seems more work to me to first read Java and try to re-recreate it so .NET implementation looks like Java or is constructured otherwise very similarily. I think ".NET like Java" could deter contributors (it would me) as contributing would mean first reading the Java implementation and then writing code in way that makes the same assumptions. Of course also it also always require the Java implementation be there or assume how it reasonable could be if it happens .NET one moves first to implement something (so as not to diverge later then).

The merged spike is already fairly different and I don't see alternatives so I am not sure if I understand correctly the wider case of moving closer to Java. The spike considers inheritance as an extension point already. It's possible and planned to inherit from DidDocument and other classes (like KeyFormat, like is done) if named attributes is needed and wanted. One another extension point is JsonExtensionData that catch attributes that are not in DID Core specification. It may very well be that one receives DID documents that extend Core and doesn't want to throw at deserialization point (or serialize fairly straightforwardly).

Maybe I open an issue for discussion about this extension later today.

juanfranblanco commented 3 years ago

Ok testing the Uri vs custom it does not work exactly at all. you can validate this in the unit test. Actually I was pretty excited after looking at in the deserialisation of json in the Service, so i get what you are coming from.

Yes truly is not leaning to have one alike implementation as it will never be the same, but there is also an element of leverage and not reinventing the wheel. Most important when you look at others implementations you can a provide invaluable feedback when making your own. So it is a combination of both reading the spec, looking at others implementation and provide feedback on the way to others, so they can improve when they can or you can leverage other peoples brains.

We are just parsing Json at the moment and some Uris.. this should be not a complex topic. I trust that you can come up with a much better solution using Uri, maybe extending it etc.


    [Fact]
        public void ShouldParseDidUrlFull()
        {
            var didUrl = DidUrlParser.Parse("did:example:test:21tDAKCERh95uGgKbJNHYp;service=agent;foo:bar=high/some/path?foo=bar#key1");

            Assert.Equal("agent", didUrl.Params["service"]);
            Assert.Equal("high", didUrl.Params["foo:bar"]);
            Assert.Equal("example", didUrl.Method);
            Assert.Equal("test:21tDAKCERh95uGgKbJNHYp", didUrl.Id);
            Assert.Equal("/some/path", didUrl.Path);
            Assert.Equal("key1", didUrl.Fragment);
            Assert.Equal("foo=bar", didUrl.Query);
            Assert.Equal("did:example:test:21tDAKCERh95uGgKbJNHYp;service=agent;foo:bar=high/some/path?foo=bar#key1", didUrl.Url);

            var uri = new Uri(
                "did:example:test:21tDAKCERh95uGgKbJNHYp;service=agent;foo:bar=high/some/path?foo=bar#key1"); 
        }

veikkoeeva commented 3 years ago

OK, good example. The merged spike does not have yet code for extracting the values.

What currently happens that strings that should conform to RFC3986 are serialized to Uri. If that succeeds, the serializer does not throw, then it's probably fairly sure it's a correctly formed URI (as far as I know, Uri is hardened). I don't have deep feelings about this, but it addresses at least this particular security concern while being a familar way of working with URIs in .NET (and a typed way if it's supposed to be an URI).

I don't know if there is a benefit to change RFC3986 URIs to strings in the model, if not, in your example it would be var didUrl = DidUrlParser.Parse(new Uri("did:example:test:21tDAKCERh95uGgKbJNHYp;service=agent;foo:bar=high/some/path?foo=bar#key1").OriginalString); from on top of my head. Here new Uri(...) is just a a device to show code, since it comes from serialized data, it's currently in Uri type.

So we need code in any case.

I don't know about internals, but maybe end-user facing could be something like

var service = GetService(new Uri("did:example:test:21tDAKCERh95uGgKbJNHYp;service=agent;foo:bar=high/some/path?foo=bar#key1")) where this internally could use something like you show earlier. Here again I used this new Uri to just mimick conceptually the already serialized Uri in a type.

Maybe this is better discussd in a separate issue or a PR.

juanfranblanco commented 3 years ago

Yes looking at the current spec, it does not enforce many areas to keep the URI compatibility, but those extensions will be the ones that users want. Although thinking about it now I also get your concern about security too, so back to the original comment if Uri is a better solution due to security although some code needs to be added to parse specific areas, that will be a good feedback to the other implementations. Mainly can the id / method or whatever be hijacked.. or force to be parse incorrectly after signing to route in another way.. random thought.

veikkoeeva commented 3 years ago

@juanfranblanco I try to be clearer, I think we need some code to extract values from Uri even if it is used as a storage, so to speak. We know that the URI should be well formed, no shenigans, if Uri can parse it. Then it should be safer to use regexes on it as a whole or to fractions of it, however we come up with concrete implementation. One easy way come up with implementation I think is to have nicely named function, link to the specification in its documentation (e.g. GetService) and internally use that code you show to extract service part from the URI. One way of seeing this. There may be others, but maybe those other people chime in years from now and see this discussion and have a clue why things are how they are.

decentralized-identity / did-common-dotnet

Implementation ("porting") parity between different implementations #8