dslc-io / tidytuesdayR

Extract weekly TidyTuesday Data/Readme
https://dslc-io.github.io/tidytuesdayR/
Other
77 stars 15 forks source link

Deal with old failing weeks #90

Closed jonthegeek closed 1 month ago

jonthegeek commented 1 year ago

This fails:

tidytuesdayR::tt_load(2018, 7)
#> --- Compiling #TidyTuesday Information for 2018-05-14 ----
#> --- There is 1 file available ---
#> --- Starting Download ---
#> 
#>  Downloading file 1 of 1: `week7_starwars.csv`
#> Error in nchar(x, "width"): invalid multibyte string, element 1

Created on 2023-05-19 with reprex v2.0.2

That week doesn't have a README for whatever reason, which I think is the source of the issue.

We should harden tidytuesdayR to deal with issues over in the main repo. If it can still get the data, do at least that much, and probably inform the user about any other specific strangeness.

jonthegeek commented 1 year ago

Oh, huh, it isn't the lack of README; it's the data itself. Fixing!

Still worth dealing with!

thebioengineer commented 1 year ago

I think it should be able to handle not having a readme, since in the beginning there were weeks with just having data. (ie Week 1 of 2018).

Are you suggesting hardening in general for every step? Since the purpose of tt_load is to get the data, I am thinking I would want an error to happen if it can't download the data itself. but maybe give a more informative message?

thebioengineer commented 1 year ago

The issue you are seeing comes from the data seeming to get corrupted on the gh side - I had some tests reading this week originally and started failing. I submitted a ticket to gh since it should work and the csv appeared to generally be fine.

jonthegeek commented 1 year ago

When I download the csv, it still fails.... because that dataset is a mess. I think this was a different style, where we gave them very very messy data to play with. I really don't think it's a GH error.

I think mostly making it fail with an informative error would be the fix to close this ticket.