Use DispatchProxy to support RDF multiple inheritance properly

berezovskyi commented 1 month ago

Both Lyo and OSLC4NET (incorrectly) assume that every (OSLC) RDF resource has a "primary" RDF type, which has an associated shape. This allows the (un)marshaller to associate a POJO/POCO to this shape and (un)marshal the RDF resource onto a given class. All other rdf:type values are collected in an array.

Of course, this is wrong and is an opportunistic reduction to fit a graph-based RDF model peg where properties do not belong to classes to the OO-world hole where properties belong to classes and dynamic multiple inheritance is not a thing. This impedance mismatch needs to be addressed to be able to work properly with the larger world of Linked Data applications outside OSLC.

One way to deal with this is not to use POCO/POJOs at all. Unfortunately, it only works well for languages that are not statically typed (good RDF libs for Ruby, JS, Elixir, Python; the story with dynamic in C# needs to be evaluated, see ExpandoObject in the std lib). Approaches where data is decoupled from logic work best (Prolog, Clojure). In statically typed languages, the boilerplate amount required is not insignificant and, most of all, using the RDF data without an abstraction often means losing the type safety. For C#/Java, we can try the following abstraction:

Every unmarshalled resource implements IExtendedResource.
RDF types map to tag interfaces (i.e., no members).
OSLC/SHACL shapes that target types map to interfaces with props mandatory for those shapes. The worst nightmare of multiple inheritance - method/property overlap - is irrelevant for us because the semantics of accessing a property found on both interfaces are the same - we simply look it up on the same RDF resource.
The interfaces above are to be generated using C# source generators (link, link).
Unmarshalling creates a dynamic proxy for each resource, making it implement any number of the above interfaces on the fly. See https://devblogs.microsoft.com/dotnet/migrating-realproxy-usage-to-dispatchproxy/ for C# (rather, Castle is more suited for this: https://github.com/castleproject/Core/blob/17fd99c199e5cbaf589cfbd79a571ba4d53c043b/src/Castle.Core/DynamicProxy/IProxyGenerator.cs#L589) and https://www.baeldung.com/java-dynamic-proxies for Java.

Additionally, the https://stackoverflow.com/questions/58453972/how-to-use-net-reflection-to-check-for-nullable-reference-type API in .NET 6+ allows to eliminate the https://github.com/OSLC/oslc4net/blob/main/OSLC4Net_SDK/OSLC4Net.Core/Attribute/OslcOccurs.cs attr from most declarations (i.e. string prop would be ExactlyOne, string? ZeroOrOne, IReadOnlyCollection<string>? would be ZeroOrMany but IReadOnlyCollection<string> can be be OneOrMany or ZeroOrMany - probably good to default to OneOrMany but allow ZeroOrMany via an attribute).

? https://en.wikipedia.org/wiki/Dominator_(graph_theory)

jamsden commented 1 month ago

I think this is a great idea and moves from a generative to interpretive approach. Marshaling and unmarshaling do take time, but are often at the endpoints of GET and PUT. An interpretive approach moves these data transformations to incremental and per use. This can have performance implications that should be considered.

berezovskyi commented 1 month ago

Jim, thank you. This is exactly the plan - to study (1) how far can we move along the generative-interpretive continuum in a statically typed language (C# in our case) and (2) where does the greatest developer benefit lie.

Having advanced code generators at our disposal (in .NET, we can use the Roslyn compiler bits to generate sources on the fly during development plus you can declare a class to be partial) allows us to add Turtle files to the repo but access C# classes and interfaces with zero dynamic overhead at runtime. Same performance as if someone carefully hand-rolled that code.

On the other hand, having access to reflection, dynamic proxies, and even dynamic dispatch (Java has it too, e.g. for use from Groovy - although Groovy is not the fastest from the bunch) allows to go to the far ends of the interpretive galaxy depending on how much performance are we willing to trade. In general, the OSLC code I have seen so far was quite inefficient, using synchronous network calls, not keeping connections alive or setting thread pools correctly. When I wrote an async client in Kotlin for the RefImpl with quite modest 250 rps (though the server code was not migrated to async), an idea to give such modest power to Lyo users was met with some worry on what load would mean for the OSLC servers (providers). Thus, I am inclined to believe that in the OSLC space (and the broader space of enterprise integration) we have quite a bit of performance to trade off if the baseline is set very high (100k+ rps for servers like Kestrel; Eclipse Vert.x for Java/JVM easily starts off at 10k+) to trade off from there onwards.

OSLC / oslc4net

Use DispatchProxy to support RDF multiple inheritance properly #195