culturecreates / artsdata-orion

Collection of data sources loaded into Artsdata by Culture Creates
0 stars 0 forks source link

Port Yard Bird Huginn to Ruby #15

Closed saumier closed 9 months ago

saumier commented 10 months ago

Yard Bird Suite has events listed on this webpage: https://yardbirdsuite.com/events/

Please name this artifact : yardbirdsuite-com

Link to Huginn https://huginn-staging.herokuapp.com/scenarios/59/diagram

@dev-aravind Please use the existing repo artsdata-orion to add your code and workflows. This will be a repo for many websites which we can do together in the same repo because no one else is collaborating on these. I think we will hit 100 websites before the end of the year, so please consider the structure of the code to not repeat (DRY). If in the future, an organization wants we can split a website off into its own repo, but until then lets make them all in this repo if the repo does not already exist.

dev-aravind commented 10 months ago

@saumier The workflow and ruby file is up for this webpage, but I accidentally set the artifact name as yardbirdsuite-events the first time. So there is duplicated data in nebula with artifact names yardbirdsuite-com and yardbirdsuite-events. Can you please delete the duplicated entities? Apologies for the inconvenience.

dev-aravind commented 10 months ago

@saumier I wrote a SPARQL in reference to the one you used for scenesfrancophones to fix the blank nodes. But as we had some places too with blank node issues, I made some modifications to it. But the blank nodes are not getting replaced by UUIDs as of now. Can you help me work on this one?

SPARQL file

saumier commented 10 months ago

@dev-aravind I took a look at the data, and I think we don't need to replace blank nodes of the places because they are all nested inside the events. I only need a way to access the top level entities. So please use the same SPARQL as you did in IPAA.

I will remove entities with blank nodes from being in the list pages of Nebula. I added them temporarily so you could see that they were there. https://github.com/culturecreates/nebula/commit/4830b6afb5edcd2607654782f4ce12a707de786b

Also, it is not a good idea to use URIs like schema:Event to filter unless you are using RDF inferencing. This is because an event may have a type schema:MusicEvent which is a sub-class of schema:Event. When inferencing is used (turned on), then filtering with schema:Event will include all sub-classes such as schema:MusicEvent and scheme:DanceEvent. We are currently not using inferencing when we run SPARQL on local graphs, so it will not pickup all the sub-classes. There is a way to use inferencing when we work on local graphs, but we don't need to yet. In the schema.org vocabulary loaded into Artsdata, you can see that schema:MusicEvent is a subClassOf schema:Event in Artsdata here

dev-aravind commented 10 months ago

@saumier The blank node replacement is up in this PR along with unit tests. Please review it and let me know if you need any changes.

saumier commented 10 months ago

@dev please check my requested changes in the https://github.com/culturecreates/artsdata-orion/pull/25.

dev-aravind commented 9 months ago

@saumier Other than the eventAttendanceMode and eventStatus vallues. The data looks fine. Let me know what you think

saumier commented 9 months ago

@dev-aravind Looking good. I have one question... Why is the description missing from this event even though it is present in the JSON-LD of the Web Page? It shows as " " in graphdb. https://artsdata-nebula-d1ec887e2637.herokuapp.com/entity?uri=urn%3Auuid%3A55828532-6126-418a-9cd7-1e9e26895590

dev-aravind commented 9 months ago

@saumier The description for this event is a non-breaking space.

Image

saumier commented 9 months ago

@dev-aravind In the web page you scraped I see a description "http://schema.org/description":[{"@value":"Tae Kim, is a pianist, arranger, composer, and educator based in Alberta."}]

So my question remains: Why is the description missing? Here is the link to the web page https://yardbirdsuite.com/shows/tuesday-jam-hosted-by-tae-kim/

We need to figure out at which point the description goes missing. Is it before sending to the Artsdata Databus or after?

dev-aravind commented 9 months ago

@saumier Assigning this to you as the description was retained on re-running the workflow

saumier commented 9 months ago

Looks good.