Closed tnguyenh closed 9 months ago
Some issues for merging:
not exactly the same name:
not the same time:
How can I help here?
J'ai fait une grosse passe sur les events généré pour identifier les probleme avec les places y'a à mon avis une discussion à avoir en line avec cette issue https://github.com/agrignard/WhatsNext/issues/18 et surtout celle ci pour essayer d'éliminer le plsu possible les events qui des places qui s'appellent prese pareil https://github.com/agrignard/WhatsNext/issues/17
done
Well done!! Je suis curieux de savoir quel event tu gardes parmis les duplicates? bien joué regex en tout cas!
I am checking day by day I notice a duplicate for Marché Gare for this event on the 14/02/2024 https://marchegare.fr/agenda/lankum
Yep. One problem is when the event name is not the same. I can find three references to the same event:
The big issue with the scraping is to deal with error and inconsistencies from other sites.
Until now, the merge process compares the event name strings and find similarities.
Concert de l'hostel Dieu vs Fugacités at la Rayonne 14/2/2024 appears twice
We have to define the process to remove duplicates.
For places for which scrapex scraps the dedicated site (eg: Transbordeur), we can assume that events are fully and correctly listed there. Unless some other sites provide better info, I propose to skip events from other sources: eg, events from Petit Bulletin for Transbordeur will be skipped.
For places with no dedicated site scrapped, some priority order and way to process the events should be defined.