Spec.qc.ca not loading events

culturecreates / artsdata-orion

Collection of data sources loaded into Artsdata by Culture Creates

0 stars 0 forks source link

Spec.qc.ca not loading events #82

Open saumier opened 3 weeks ago

saumier commented 3 weeks ago

When running the workflow for spec.qc.ca the system exits with an error: Max retries reached. Unable to fetch the content for page .

dev-aravind commented 3 weeks ago

Notes: The crawling works fine in a local machine, but fails when it is running in a github runner.

Task for @dev-aravind - Add the user-agent header in all steps of the crawling process, which includes fetching entity URLs, fetching entity details ( both headless and headful mode ).

@saumier will try and contact the Spec.qc.ca developer team to allow our user-agent to crawl their website.

saumier commented 3 weeks ago

@troughc I sent you an email for Isabelle to ask her tech team to allow the Artsdata crawler User Agent "artsdata-crawler/3.3.0"

Additional note: Artsdata crawler agent is "artsdata-crawler/3.3.0" however the tech teams have been informed to only match to "artsdata-crawler", because the version number (currently 3.3.0) changes with each update.

troughc commented 3 weeks ago

email was sent

dev-aravind commented 2 weeks ago

@saumier The user-agent is now added to every step.

troughc commented 2 weeks ago

@dev-aravind the tech teams have been informed to only match to "artsdata-crawler", because the version number (currently 3.3.0) changes with each update.

saumier commented 2 weeks ago

@fjjulien Please let me know if you hear anything from Isabelle at Spec regarding our crawler being allowed in. Once the Artsdata crawler is allowed in I will run another crawl of their event JSON-LD.