Open saumier opened 1 year ago
@dev-aravind Only a couple of Huginn scenarios left to migrate ;-)
@saumier will add the huginn crawling details here.
@dev-aravind Here is the agent from Huginn. Instead of a CSS class it uses "xpath": "//article[@class=\"show\"]//a"
to get the list of @href for the events.
{
"expected_update_period_in_days": "100",
"url": [
"https://grandtheatre.qc.ca/programmation/"
],
"type": "html",
"mode": "all",
"extract": {
"url": {
"xpath": "//article[@class=\"show\"]//a",
"value": "concat(\"https://grandtheatre.qc.ca\",@href)"
}
},
"template": {
"graph_name": "{{graph_name}}"
}
}
The GrandTheatreQuebec already has a Planet. This is to remove the crawling still happening on Huginn. The workflow in Huginn has an extra step when crawling each page, that is to scrape the html for the keywords of each event page. The keywords is missing from the JSON-LD and is added to JSON-LD by the workflow and then mapped to the GrandTheatreQuebec event type SKOS.
If needed, I can give you access to Huginn
So I propose working in steps (each step can be loaded into Artsdata for review)