SolidLabResearch / Challenges

24 stars 0 forks source link

Enhancing Query Results with Schema Alignment in an Aggregator #112

Open maartyman opened 1 year ago

maartyman commented 1 year ago

Pitch

This is a challenge on using aggregators to make a view on your Solid pod data. It proposes a solution to the issue mentioned in the "What's in a POD" paper. We suggest using an agent that allows other parties to query your pod with a SPARQL endpoint, but where the queries are first rewritten based on a mapping.

We focus on a personal health data sharing scenario, inspired by the We Are platform in Flanders [https://we-are-health.be]. Citizens are asked to fill a health questionnaire known as GGDM. As this pertains personal information, answers to the questions are stored in their pod using a designed GGDM vocabulary. Now assume a regional research survey (RRS) which asks people access to their GGDM data in order to study diabetes. Alice is willing to participate, but only wants to share selected info. Moreover, for her diabetes status, she refers to her health record, which was directly filled in her pod at the hospital. This record using the FHIR vocabulary [7], however. Thus, Alice instructs her Web agent to invoke two schema mappings defining her view for RRS: (1) directly retrieve only selected GGDM answers; and (2) transform my diabetes status from FHIR to GGDM. Now RRS, contacting Alice’s Web agent, may come with a query to retrieve all available GGDM answers, on condition that her diabetes status is positive. POD-QUERY will automatically rewrite this query correctly, checking diabetes status in FHIR and returning only the answers (e.g., eating habits and exercising) that Alice instructed to share. For another example, RRS may ask how many GGDM answers Alice makes available. In general, arbitrary client queries can be posed, but will be rewritten to answer only Alice wants to make available to this party.

This challenge is in collaboration with UHasselt, they have built a query rewriter for the schema alignment, and we supply the aggregator to create and maintain the view.

Desired solution

The solution should be an aggregator that receive queries and then utilizes the (by UHasselt provided) query rewriter to rewrite the queries based on predetermined mappings. It is important to note that automatic view creation or rule discovery and selection are NOT required for this challenge.

Acceptance criteria

The desired solution should include a user interface (UI) that allows users to select different queries:

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix ggdm: <https://vito.be/schema/ggdm#>
prefix sur:  <https://w3id.org/survey-ontology#>
prefix prov: <http://www.w3.org/ns/prov#>

# On condition that the diabetes status is positive (answer yes to question2),
# retrieve available GGDM answers on age, eating habits, and exercising.
SELECT ?age ?fruits ?exercise
WHERE {
  ?completedQ2 sur:answeredIn ?_s ;
               sur:completesQuestion ggdm:question2 ;
               sur:hasAnswer ggdm:yes .
  ?_s prov:wasAssociatedWith ?person .
  OPTIONAL {
    ?completedQ9_1 sur:completesQuestion ggdm:question9-1 ;
                   sur:hasAnswer ?fruits .
  }
  OPTIONAL {
    ?completedQ10 sur:completesQuestion ggdm:question10 ;
                  sur:hasAnswer ?exercise .
  }
  OPTIONAL {
    ?person foaf:age ?age .
  }
}
prefix sur:  <https://w3id.org/survey-ontology#>

# How many GGDM questions are available?
SELECT ( COUNT(DISTINCT ?completedQuestion) AS ?count )
WHERE {
  ?completedQuestion sur:answeredIn ?session .
}

The first query focuses on the schema alignment aspect, where the hospital records (in the FHIR ontology) will return results for the GGDM query. The second query shows the privatization aspect, not all the questionnaire queries are returned.

pheyvaer commented 1 year ago

Two things about the acceptance criteria

I don't understand why you need a query rewriter for schema alignment when the alignment happens in the aggregator.

maartyman commented 10 months ago

Made some changes and added the different queries!

pheyvaer commented 10 months ago

@maartyman Can you add concrete steps for the acceptance criteria? You find an example at https://github.com/SolidLabResearch/Challenges/issues/120