Open saumier opened 1 week ago
@dev-aravind Please advance this and try to get it to work without waiting for the other issue to migrate to Github Artifacts. https://github.com/culturecreates/artsdata-orion/issues/70 because I am blocked with being able to use Github artifact download urls.
@saumier The development part is complete, but the initial step of calling the fetch-data workflow is failing because artsdata-score cannot commit the JSON-LD that artsdata-orion generated because of a permission issue. This is blocking us from adding the score to the JSON-LD.
workflow: https://github.com/culturecreates/artsdata-score/actions/workflows/generate_report.yml
sample inputs: page_url: "https://agoradanse.com/evenement/" entity_identifier: "div.x-container.max a" file_name: "agoradanse-events.jsonld" is_paginated: "false" headless: "false"
There is a structured data score algorithm created using SHACL and SPARQL (in this repo
culturecreates/artsdata-score
). Here is the full back story https://github.com/culturecreates/artsdata-data-model/discussions/120.This issue is to build a solution that will batch score events. This is a prototype.
We should start with Approach 1. I have included a second alternative approach using Google sheets but that is not needed yet.
Prerequisites:
Approach 1 - Workflow creates a report
Step 1: Create an interactive workflow in this repo Step 2: The workflow calls Orion's new Action to fetch-data (not including push to artsdata) with the page_url and entity_identifier to extract the webpage urls and options like headless and is_paginated --> creates a Github artifact (instead of a commit) Step 3: The workflow calls a new ruby code in this repo:
graph = RDF::Graph.load(artifact)
SHACL.open(shacl.ttl).execute(graph)
and the existing construct SPARQLSPARQL.execute(score_sparql, graph, update: true)
to insert the score.report = SPARQL.execute(report_sparql, graph)
to create a report with url, event URI, score, breakdown for each webpageAlternate Approach - Google sheet App script [NOT NEEDED YET]
Build an App script to call the structured data score API in Artsdata from a Google sheet. This Google sheet would have a column of webpage urls for event details. So if a website had 50 events there would be 50 webpage urls, one per event.
The idea is to load the score (a number from 0 to 60) into the cell next to the url. For example, if a person enters "=strucutredDateScore(A1)" into the cell it should call the App script to get the url from cell A1 and then call the score API and display the score (or a fail message).
The webpage url is passed to the score API as a parameter
&uri=
. The API returns a graph in JSON-LD where each event has a property "score". The other parameters&post_sparql=
and&shacl=
should be constant and are used to pass in the file for validating the data and the sparql to run after which adds the score to the graph.Here is an example API call:
https://kg.artsdata.ca/en/dereference/external.jsonld?post_sparql=https%3A%2F%2Fraw.githubusercontent.com%2Fculturecreates%2Fartsdata-score%2Fmain%2Fsparql%2Fscore_algorithm.sparql&shacl=https%3A%2F%2Fraw.githubusercontent.com%2Fculturecreates%2Fartsdata-score%2Fmain%2Fshacl%2Fshacl_for_scoring.ttl&uri=https%3A%2F%2Fimperialtheatre.ca%2Fevent%2Fretro-film-national-lampoons-vacation-1983%2F
The score API can also be used from the Artsdata nebula user interface by clicking "Compute score" after dereferencing any URL. The only difference in the call is that the path ending with the method
external
is without the format.jsonld
. So instead of/dereference/external.jsonld?
is it/dereference/external?
to display a human readable webpage.In the resulting graph the score property is
http://example.org/score
So using an RDF library the score can be extracted using
[nil, RDF::URI("http://example.org/score"), nil ]
and the resulting object.value can be displayed as a comma separated list in the cell. If there is more than one (because there is more than one event entity in the webpage, then all solutions should be displayed as a list in the spreadsheet cell.Here is an example output in JSON-LD trimmed to include only relevant data: [ ... { "@id": "http://top.blank.node/82565b60-3f58-40c0-ab0e-f6a61ed0a5c6", "@type": [ "http://schema.org/TheaterEvent" ], ... "http://example.org/score": [ { "@type": "http://www.w3.org/2001/XMLSchema#integer", "@value": "55" } } ... ]
NOTE: If there is no RDF library then I can enhance the API to return a simple JSON that will be easier to parse, so be sure to let me know if you need this.