Error in processing timeseries data in your pipeline

Kirscher / ResultAthle

Scraping, visualisation et analyse de résultats d'athlétisme depuis le site bases.athle de la FFA

MIT License

11 stars 2 forks source link

Error in processing timeseries data in your pipeline #40

Closed linogaliana closed 4 months ago

linogaliana commented 4 months ago

I was trying to run your main.ipynb file using Quarto. I encountered the following error, probably because one of the pages have changed since you ran your pipeline. It comes from this cell:

header, data = scraping.get_results("https://bases.athle.fr/asp.net/liste.aspx?frmbase=resultats&frmmode=1&frmespace=0&frmcompetition=282742", 13)

Looking at utils/scraping.py, I found the error comes from there:

data["hours"] = data["Chrono"].dt.hour
data["minutes"] = data["Chrono"].dt.minute
data["seconds"] = data["Chrono"].dt.second

Which I identified as coming from the first row of your dataset where html parsing did not end well:

Is it possible for you to fix that problem ?

Kirscher commented 4 months ago

The results page has indeed changed, with the organizers adding a new line for performance points counts. I hadn't seen this in previous pages, which might be because this race is international. As a result, the current scraping script isn't robust enough to handle these changes.

I am working on it to fix it

linogaliana commented 4 months ago

If you have a page that you know is working well, it is ok to me. I know webscraping can be unstable and its hard to prevent problems in such pipeline.

What I want to do is see your quarto output, which should work fine if you propose me a page that is not generating this bug

Kirscher commented 4 months ago

The provided result page should now work properly. This update also covers standard events such as 5K, 10K, half marathons, and marathons. However, for more complex meetings that include multiple competitions race formats and types like throws and hurdles, there may still be some issues with robustness.

linogaliana commented 4 months ago

Thanks for the quick action, I have been able to reproduce your app. You can close the issue