culturecreates / artsdata-score

Structured data score based on Artsdata data needs
0 stars 0 forks source link

Batch JSON-LD Scoring from spreadsheet #1

Open saumier opened 1 week ago

saumier commented 1 week ago

There is a structured data score algorithm created using SHACL and SPARQL (in this repo culturecreates/artsdata-score). Here is the full back story https://github.com/culturecreates/artsdata-data-model/discussions/120.

This issue is to build a solution that will batch score events. This is a prototype.

We should start with Approach 1. I have included a second alternative approach using Google sheets but that is not needed yet.

Prerequisites:

Approach 1 - Workflow creates a report

Step 1: Create an interactive workflow in this repo Step 2: The workflow calls Orion's new Action to fetch-data (not including push to artsdata) with the page_url and entity_identifier to extract the webpage urls and options like headless and is_paginated --> creates a Github artifact (instead of a commit) Step 3: The workflow calls a new ruby code in this repo:

Alternate Approach - Google sheet App script [NOT NEEDED YET]

Build an App script to call the structured data score API in Artsdata from a Google sheet. This Google sheet would have a column of webpage urls for event details. So if a website had 50 events there would be 50 webpage urls, one per event.

The idea is to load the score (a number from 0 to 60) into the cell next to the url. For example, if a person enters "=strucutredDateScore(A1)" into the cell it should call the App script to get the url from cell A1 and then call the score API and display the score (or a fail message).

The webpage url is passed to the score API as a parameter &uri=. The API returns a graph in JSON-LD where each event has a property "score". The other parameters &post_sparql= and &shacl= should be constant and are used to pass in the file for validating the data and the sparql to run after which adds the score to the graph.

Here is an example API call:

https://kg.artsdata.ca/en/dereference/external.jsonld?post_sparql=https%3A%2F%2Fraw.githubusercontent.com%2Fculturecreates%2Fartsdata-score%2Fmain%2Fsparql%2Fscore_algorithm.sparql&shacl=https%3A%2F%2Fraw.githubusercontent.com%2Fculturecreates%2Fartsdata-score%2Fmain%2Fshacl%2Fshacl_for_scoring.ttl&uri=https%3A%2F%2Fimperialtheatre.ca%2Fevent%2Fretro-film-national-lampoons-vacation-1983%2F

The score API can also be used from the Artsdata nebula user interface by clicking "Compute score" after dereferencing any URL. The only difference in the call is that the path ending with the method external is without the format .jsonld. So instead of /dereference/external.jsonld? is it /dereference/external? to display a human readable webpage.

In the resulting graph the score property is http://example.org/score

So using an RDF library the score can be extracted using [nil, RDF::URI("http://example.org/score"), nil ] and the resulting object.value can be displayed as a comma separated list in the cell. If there is more than one (because there is more than one event entity in the webpage, then all solutions should be displayed as a list in the spreadsheet cell.

Here is an example output in JSON-LD trimmed to include only relevant data: [ ... { "@id": "http://top.blank.node/82565b60-3f58-40c0-ab0e-f6a61ed0a5c6", "@type": [ "http://schema.org/TheaterEvent" ], ... "http://example.org/score": [ { "@type": "http://www.w3.org/2001/XMLSchema#integer", "@value": "55" } } ... ]

NOTE: If there is no RDF library then I can enhance the API to return a simple JSON that will be easier to parse, so be sure to let me know if you need this.

saumier commented 2 days ago

@dev-aravind Please advance this and try to get it to work without waiting for the other issue to migrate to Github Artifacts. https://github.com/culturecreates/artsdata-orion/issues/70 because I am blocked with being able to use Github artifact download urls.

dev-aravind commented 2 hours ago

@saumier The development part is complete, but the initial step of calling the fetch-data workflow is failing because artsdata-score cannot commit the JSON-LD that artsdata-orion generated because of a permission issue. This is blocking us from adding the score to the JSON-LD.

workflow: https://github.com/culturecreates/artsdata-score/actions/workflows/generate_report.yml

sample inputs: page_url: "https://agoradanse.com/evenement/" entity_identifier: "div.x-container.max a" file_name: "agoradanse-events.jsonld" is_paginated: "false" headless: "false"