Closed saumier closed 10 months ago
@sahalali Notes from our design discussion on exporting to Artsdata.
To generate JSON-LD lets try to loop across published events to generate JSON like the CMS open API, then add a single @context to convert to JSON-LD. Some transformations may be needed but in the ideal world there are none. We should probably not nest place, organization, person, taxonomy inside events. Instead the JSON can be flat and include all classes. Otherwise the data will repeat. For example we should avoid each event with the same place having the same nested place data.
Actor 1 should be a workflow on Github in a repo called “artsdata-planet-footlight, and the credentials for calling the Artsdata Databus will use the credentials from that repo.
This general approach keeps the CMS responsibilities focused on generating CSV and RDF. CMS does not need to know how to call Artsdata Databus with credentials, nor how to schedule tasks. All extra controls that we had in Capacitor to manage this flow are not needed in CMS backend. Github actions can be used to monitor task success/failure and manage the archive of data dump versions.
Let me know if you have any other concerns at your convenience.
@sahalali I created the Github repo https://github.com/culturecreates/artsdata-planet-footlight
I added your dump file in the a directory called "dump". I added a basic ruby program to frame the data and save the result to the "output" directory. Finally, I passed the data through SHACL from Artsdata and saved a text report in the "output" directory ending in .txt.
The first thing I noticed is that the Event location is missing.
I would like you to walk me through your code so I can make comments directly in the code.
There are also many terms that are 'not' schema.org terms, such as our taxonomies and custom additional types, that we should be prefixing with something like http://kg.footlight.io/ instead of http://schema.org.
Take a look at the output directory and try to improve the results.
If you can send me a dump of the plain json before applying the @context, then we can iterate more quickly on improving the @context.
@saumier The zipped folder contains the plain json file with 25 events, the current json-ld file and the file contains context and frame.
@saumier Can you please look into it and please assist me in improving json-ld.
@sahalali I created a basic ruby practice repo that starts with a minimal JSON-LD Context and JSON-LD Frame and converts the 25 events and then validates with a minimal SHACL.
Take a look and we can start adding more properties gradually one by one.
The next property we should add is "url". Take a stab and I can comment on this specific case before going any further.
@sahalali I am trying to export RDF from Footlight CMS but I get a 504 Undocumented | Error: Gateway Time-out.
'https://api.cms.footlight.io/entities/export?file-format=ttl&entity=Event' \ -H 'calendar-id: 6308ef4a7f771f00431d939a' \
I propose starting simple with the basic properties I am currently uploading manually to Artsdata which are:
Can you get the export to work with those properties?
@sahalali I also noticed that the system was frozen during the download.
@sahalali - The properties need to be fixed as follows: schema:additionalType --> must point to a URI schema:name --> OK schema:location --> must point to a URI that has a type "Place" or "VirtualLocation" schema:address --> must point to a URI that has type "PostalAddress" schema:sameAs --> must point to a URI schema:startDate --> partially OK but schema:startDateTime should not exist schema:endDate --> partially Ok but schema:endDateTime should not exist
@sahalali I realize that this is quite hard because we are trying to "backwards" engineer the @context and the JSON has diverged quite a bit from a schema.org type of JSON-LD. Another approach, maybe more developer friendly, is to use OntoRefine to map the JSON to RDF. Let me know what you think. The Github workflow would do a GET from the Open API and then convert it to RDF and send it to Artsdata. I can help you with the OntoRefine mapping. I think this will be faster as well.
@saumier I like the idea of using Onto Refine. Can you help me with the Artsdata API that can be used to send data to Artsdata? I will create and prepare a mapping file.
@sahalali Please check the workflows that @dev-aravind has created for Scenes Fracophones. You can use the same variables: https://github.com/culturecreates/artsdata-planet-scenesfrancophones/issues/7
@saumier Can you please add the secret "PUBLISHER_URI_GREGORY" to the organization-level secret.
@sahalali I am closing this issue from the Footlight CMS project. It is a duplicate of https://github.com/culturecreates/artsdata-orion/issues/3 and https://github.com/culturecreates/artsdata-planet-footlight/issues/10
Export all events including linked places, people, orgs to Artsdata in RDF