culturecreates / artsdata-orion

Collection of data sources loaded into Artsdata by Culture Creates
0 stars 0 forks source link

Port Capitol Theatre Huginn to Ruby #17

Closed saumier closed 2 months ago

saumier commented 10 months ago

Capitol theatre located in the city of Nelson in the province of British Columbia

https://capitoltheatre.ca - try to find webpages with structured data for events

Watch out for place with same name but in another Canadian city.

Old Huginn: https://huginn-staging.herokuapp.com/scenarios/7

Please name this artifact: capitoltheatre-ca

dev-aravind commented 10 months ago

@saumier I wrote a workflow for this website and the data is up in artsdata. However, there is a SHACL violation for te events because the Performer is set as "Organizer for some reason". Also, there are a lot of duplicated data in the events. Let me know what you think.

saumier commented 10 months ago

@dev-aravind There are a couple of things to fix:

  1. I don't see the "prov:wasDerivedFrom" triples that we must add when generating blank nodes.
  2. In your Ruby code that builds the list of webpages, there are duplicate URLs. Please remove the duplicates otherwise each time the page is loaded into the graph it will create a duplicate event. In Ruby you can use my_list.uniq! or something similar.
  3. The SHACL with performer is working correctly. The SHACL violation means that the performer data cannot be used. There is nothing for us to do but ignore the data in Artsdata, so when I create an official event with an Artsdata URI it will not include the organizer. This is the way we are using SHACL in Artsdata and it is working as expected and provides feedback to Capitol Theatre web master should they be interested in fixing their JSON-LD.
  4. The startDate is not working properly. Maybe there is a bug in the SPARQL when we fix the startDate. For example, https://capitoltheatre.ca/event/highbar-gang/ should have 10 events on different startDates. But in Artsdata there are many more and they all seem to have the same date. Please compare with validator.schema.org.
dev-aravind commented 9 months ago

@saumier I've removed the duplicate URLs, but each event page has the JSON-LD details of other events in the "Buy tickets" section.

please refer this: https://capitoltheatre.ca/event/snowed-in-comedy-tour-2024/

saumier commented 9 months ago

@dev-aravind I have taken a look. I see the problem with the "Buy tickets" section. I am still thinking about a way to handle this common mistake made on web sites in a generic way.

There are also more pages of events that we should be getting. Like this one https://capitoltheatre.ca/event/ballet-jorgen-anne-of-green-gables-capitol-season/

dev-aravind commented 9 months ago

@saumier I updated the workflow to fetch from another URL which covers more events. Please check it out and let me know if you have any questions.