Closed linogaliana closed 4 months ago
The results page has indeed changed, with the organizers adding a new line for performance points counts. I hadn't seen this in previous pages, which might be because this race is international. As a result, the current scraping script isn't robust enough to handle these changes.
I am working on it to fix it
If you have a page that you know is working well, it is ok to me. I know webscraping can be unstable and its hard to prevent problems in such pipeline.
What I want to do is see your quarto output, which should work fine if you propose me a page that is not generating this bug
The provided result page should now work properly. This update also covers standard events such as 5K, 10K, half marathons, and marathons. However, for more complex meetings that include multiple competitions race formats and types like throws and hurdles, there may still be some issues with robustness.
Thanks for the quick action, I have been able to reproduce your app. You can close the issue
I was trying to run your
main.ipynb
file usingQuarto
. I encountered the following error, probably because one of the pages have changed since you ran your pipeline. It comes from this cell:Looking at
utils/scraping.py
, I found the error comes from there:Which I identified as coming from the first row of your dataset where html parsing did not end well:
Is it possible for you to fix that problem ?