OverloadTree caching and open world types

jitsedesmet commented 3 years ago

These are 2 problems in 1 issue because they need to be implemented with much consideration with eachother. There has already been a lot of talking surrounding the cache part of this issue in #102 . To make sparqlee truly open world I think we still need to add the functionality of a user providing his own datatypes and making them have the known datatypes as a super type. This way those types can still be provided to functions expecting xsd datatypes due to subtype substitution.

jitsedesmet commented 3 years ago

I'm wondering if we could also have scenarios where an extension functions is actually an alias for a named function. In other words, a not known named function is actually equal to a known named function. This could happen? We should also have this behavior to be fully open world?

jitsedesmet commented 3 years ago

Ideas here: I would always opt to have the cache be a field in the evaluator.

How to handle the open world is a little different. There are different valid approaches.

A callback can be provided to the evaluator. When an unknown type is found we ask the callback about this type. And cache this information for the future. The expected type would be of type string => Record<string, number>.
same as 1 but, we ask the callback about this type every time. We would expect the callbacks answer to be the same each time since it would otherwise break out overload cache.
Same as 2 but, We don't expect it to be the same but the callback can indicate something has changed. Any change would flush our cache.
We use a map. The user provides a map of types on construction of the evaluator. Expected type: Record<string, string[]>
Thanks to @rubensworks: A callback can be provided to the evaluator. When an unknown type is found we ask the callback about this type. And cache this information for the future. The expected type would be of type string => string. We would recursively call callback until a we receive a type we know (in the cache or within sparqlee).
... many other options are probably available, I will edit this post and keep the list up to date.

Another thing to talk about in case of 1-4 is the amount of super types a type can have. We currently have only one super type per type but sparqlee uses a map instead of an array to handle super types. However, I would put a clear warning on using multiple (direct) super types with caution since it can create some unexpected behavior. I don't think sparqlee should test whether this is the case since that would require a lot of resources.

It might or might not be possible to have the callback be async.

comunica / sparqlee

OverloadTree caching and open world types #109