culturecreates / artsdata-orion

Collection of data sources loaded into Artsdata by Culture Creates
0 stars 0 forks source link

Remove type schema:SiteNavigationElement and others #69

Closed saumier closed 1 month ago

saumier commented 2 months ago

When crawling JSON-LD from websites with Orion, there are many objects that get included and stored in Artsdata. Many of these objects are not needed.

For all sites cralwed with Orion, please remove objects of type schema:SiteNavigationElement

Other similar requests may follow after this task is completed.

Test case: This url from imperialtheatre.ca has many SiteNavigationElement objects that should disappear. https://kg.artsdata.ca/en/entity?uri=https%3A%2F%2Fimperialtheatre.ca%2Fevent%2Fretro-film-national-lampoons-vacation-1983%2F

dev-aravind commented 1 month ago

@saumier I added a SPARQL to remove the type that you mentioned here. But it not working properly as of now as it leaves behind the type triple. Can you look into this?

dev-aravind commented 1 month ago

Task for dev: Add a unit test to check the SPARQL.

dev-aravind commented 1 month ago

@saumier The SiteNavigationElement nodes are now removed from the ImperialTheatre data.

saumier commented 1 month ago

@dev-aravind Please also remove the following from all orion sites: type http://schema.org/WPHeader type http://schema.org/WPFooter type http://schema.org/BreadcrumbList

All entities with the predicate http://www.w3.org/1999/xhtml/vocab#role

Test page: This entity (article) should have fewer "derived statements", but currently has 20+ : https://kg.artsdata.ca/en/entity?uri=https%3A%2F%2Fimperialtheatre.ca%2Fevent%2Fretro-film-national-lampoons-vacation-1983%2F

dev-aravind commented 1 month ago

@saumier I updated the SPARQL and the unit tests to remove the above mentioned fields. Please review this and let me know if you need any more changes.