Open pbonte opened 2 years ago
Also cc'ing @jeswr so he's aware of this.
@pbonte Thanks for submitting this one!
Quick question—the intro says:
Users can have links to large knowledge bases within their pod. However, when performing querying under a certain entailment regime, i.e. with reasoning enabled, the whole large knowledge base might need to be inspected.
However, from the way it is phrased to me, it is not clear whether the inspection of the whole knowledge base is a given or an issue to be fixed, and if so, if the amount of data being transferred (or the total time?) is a quality attribute. Could you adjust the description to clarify this? Thanks!
@pbonte Also, can we apply this (currently generic) challenge to a specific use case, such that the demo becomes a very specific and concrete target to reach? We could either adjust the current challenge text, or create a new issue that is basically an applied version of this bigger challenge, and link back to here?
I'm planning on creating a small (perhaps dummy) demo app to show the principle of my honours work as part of a mid-term presentation I have to give next week (Tuesday 1/3/22) - so happy to try and align this with a relevant use case here if I can.
A tentative plan (off the top of my head) was to do some kind of inference to materalize facts about diseases individuals may be predisposed to - with reasoning used to:
Federate with a larger medical database that defines the relationships in genetic conditions (e.g. the database specifies that if your grandfather is bald then you are bald) to establish whether you are predisposed to certain conditions based on whether your relatives have them.
So in this case you are interested in using just one fact of many from a (potentially large) genetic database, being:
ex:Baldness ex:predisposedIfAppearingIn ex:grandFather
And federating this with shared information about your grandFather to establish whether or not you are likely to be bald.
Of course this could be then similarly applied to many other conditions.
PS. I'm far from a medical domain expert and I'm absolutely butchering the data modelling here - this is just designed to be a proof of concept :).
Something else that could be exploited when reasoning against such databases is that they usually already have some form of reasoning applied to them. @pbonte do you know if there is any research so far into making use of this fact so as to reduce the amount of reasoning that needs to be done on the application side.
In the context of the work that I am doing with Comunica for my honours - I was thinking of having a context annotation for each data source which indicated which types of reasoning had already been applied to them - with the thought in the long term being that data sources should provide metadata about any form of 'pre-reasoning' that has been done on them.
edit Some vaguely related works
This would indeed be very interesting and allow one to save a lot of computational efforts, however, I think it's risky to make the assumption that (decentralized) knowledge bases would maintain up to date metadata about their materialisation status. I can imagine that not all knowledge bases would incrementally update their materialisation when some new facts are added, thus requiring the metadata to also maintain a list of unprocessed facts. Furthermore, this would only make sense when the same logic is used. Lets say the remote knowledge base uses RDFS, while the client uses some more expressive rule bases language (N3/SWRL/OWL2 RL), then its not completely clear how we can reuse the RDFS materialization as intermediate results. Perhaps future research can show us.
If however, knowledge bases do maintain this metadata, and the logics can somehow be aligned, then yes, this would be very interesting! On top of my head i think it would mean the following:
I think it's risky to make the assumption that (decentralized) knowledge bases would maintain up to date metadata about their materialisation status
Yes, but also see it the other way: we are in the position to make recommendations and specs.
So your IF can be enforced if that's a recommendation we can argue 😃
Furthermore, this would only make sense when the same logic is used. Lets say the remote knowledge base uses RDFS, while the client uses some more expressive rule bases language (N3/SWRL/OWL2 RL), then its not completely clear how we can reuse the RDFS materialization as intermediate results. Perhaps future research can show us.
One step may be to create a document that defines the relationships between various rule languages, and respectively the relationships between rule sets that can be defined within those languages. The former being more useful from a technical perspective to achieve things like https://github.com/comunica/comunica-feature-reasoning/issues/22 whilst the latter can be used to inform whether the remote source has materialised all of the implicit facts that would have been materialized by the rule set being used for the federated reasoning.
In the case that you mention where only a subset of the implicit data has been produced by the remote source; we could perhaps extend this idea to re-use the intermediate results by identifying the diff of which rules haven't been applied to the remote source and only apply those to the remote source in step 1 of the naive algorithm mentioned in https://github.com/comunica/comunica-feature-reasoning/issues/23.
Of course this still begs the question of how to handle implicit data within the remote source that would not have been produced by the rule set you are using; i.e. do we need to remove anything from our results?
@pbonte Why does the acceptance criteria mention the need for a UI to show that there is a link with for example DBpedia? Isn't it enough to show that a resource in the pod refers to DBpedia? For example, by inspecting that resource?
@pheyvaer because the goal of the challenges is to have a number of demonstrators. Its just a way to show the content, nothing fancy
Make sense! Maybe we could put that as a separate challenge? Have a UI that shows data with links to external data sources? Because they can then be reused by others as well for their demos maybe.
Good idea! Maybe even something more generic, showing some visualisation of the pod content that can be reused across demos? (both internal as external data) I think it might also be useful to have some tooling to show the partitioning of the data across pods, so we can show that each user keeps ownership of their own data.
Sounds good! Do you want to put it in a separate issue/challenge?
Not sure if it counts as a challenge (as these are described as: A concrete technical problem applied to a specific use case
)?
Yeah, it counts as a challenge 😉
@pbonte Can you make the extra challenges as mentioned earlier?
Pitch
Users can have links to large knowledge bases within their pod. However, when performing querying under a certain entailment regime, i.e. with reasoning enabled, the whole large knowledge base might need to be inspected.
For example:
:me foaf:knows <https://dbpedia.org/page/Tim_Berners-Lee>
:SolidEnthausiast(?x) <- dbo:wikiPageWikiLink(?x,?y) and SolidProject(?y)
andSolidProject(?x)<- dbo:Software(?x) and rdf:label(?x,'solid')
SolidEnthausiast
DBPedia needs to be checked and the above rules need to be executed.Desired solution
To enable query under entailment when data links to a large knowledge bases, such as DBPedia, the query engine and reasoning module would need to be capable of:
Acceptance criteria
A demo that displays a solution to this problem would require the following to be accepted:
Pointers
Insights on the differences between forward and backward chaining: https://github.com/pbonte/ReasoningAssignment/blob/master/reasoning_assignment.pdf