ec2u / data

EC2U Knowledge Hub
https://data.ec2u.eu
Apache License 2.0
2 stars 0 forks source link

Missing descriptions #30

Closed hmaskat17 closed 2 years ago

hmaskat17 commented 2 years ago

Steps to reproduce

  1. Check description of event at EC2U https://data.ec2u.eu/events/38113caceb55aba8396b854b1752fc01
  2. Check description of event at source https://www.vivipavia.it/site/home/eventi/articolo37265.html

What did you expect to happen? Event description is included in API and EC2U interface

What did actually happen? Description is missing from the event

Another example: Programul Hella Tech Camp

@ec2u/mmt

knoan commented 2 years ago

Two distinct issues here.

https://data.ec2u.eu/events/38113caceb55aba8396b854b1752fc01 https://www.vivipavia.it/site/home/eventi/articolo37265.html

This source is crawled using native HTML micro data annotations, which unfortunately don't currently identify textual descriptions.

Fixing would require the development of a custom scraper, which as already discussed elsewhere is beyond our current resources.

Will document for future developments.

https://data.ec2u.eu/events/8d2b5905497572522cf9d160cac330cf https://360.uaic.ro/blog/2022/07/07/programul-hella-tech-camp/

This source is crawled from a standard WordPress RSS feed. The adapter was improved to:

Newly retrieved events will included textual descriptions