NCEAS / learning-hub-organization

This repo uses GitHub projects to manage Learning Hub tasks that Learning Hub Team Leads work on
0 stars 0 forks source link

inundation R package maintenance (2021 cohort) #91

Closed camilavargasp closed 3 months ago

camilavargasp commented 3 months ago

Jeanette -- the Delta crowd is having trouble with an R package you helped write. From an email thread, they sent Matt:

During the 2021 NCEAS-DSP SWG, I published a R package with Jeanette Clark (https://github.com/goertler/inundation). I was contacted by researchers trying to use it and it seems the part that Jeanette wrote is not updating. I'm not sure what they mean by "updating" but it is likely an R package maintenance issue.

camilavargasp commented 3 months ago

hey all - I (Jeanette) ended up reaching out to Pascale directly to figure out what she meant and went ahead and put in a fix. the package goes and downloads a bunch of data from a couple of sources which have some inconsistent file naming. the two most recent years of data have a different filename structure so the function wasn't picking up on them. thats what Pascale meant by not updating - the new datasets weren't being retrieved. It was a very quick fix so I just did it

camilavargasp commented 3 months ago

there are still a few cleanup tasks that need doing on the package despite the fix. I can create some issues in the repo - can either of you work on fixing them? should be pretty fast I (Jeanette) think: (Link to issues in package repo)

angelchen7 commented 3 months ago

Small update before the weekend

First I forked the repo because I don't have push access.

Then I found out which specific dataset is causing the parsing problem. It's dayflowcalculations2019.csv, which can be downloaded here (this url is from urls[[8]]).

> dat <- lapply(urls[[8]], readr::read_csv, col_types=col_types, show_col_types = T, progress = T)
Rows: 379 Columns: 29                                                                                                                         
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (29): Year, Mo, Date, SAC, YOLO, CSMR, MOKE, MISC, SJR, EAST, TOT, CCC, SWP, CVP, NBAQ, EXPORTS, GCD, PREC, MISDV, CD, XGEO, WEST, RIO, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat) 

On closer inspection:

> dayflowcalculations2019 <- read_csv("dayflowcalculations2019.csv")
Rows: 379 Columns: 29                                                                                                                                                              
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (26): Year, Mo, Date, SAC, YOLO, CSMR, MOKE, MISC, SJR, EAST, TOT, CCC, SWP, CVP, NBAQ, EXPORTS, GCD, PREC, MISDV, CD, XGEO, WEST, RIO, ...
dbl  (2): EFFEC, EFFDIV
num  (1): X2

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat) 
> problems()
# A tibble: 379 × 5
     row   col expected   actual     file                                               
   <int> <int> <chr>      <chr>      <chr>                                              
 1     2    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 2     3    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 3     4    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 4     5    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 5     6    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 6     7    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 7     8    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 8     9    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
 9    10    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
10    11    30 29 columns 30 columns /home/anchen/inundation/dayflowcalculations2019.csv
# ℹ 369 more rows
# ℹ Use `print(n = ...)` to see more rows

Need to investigate further next week 🔍

Edit: Wait, it seems like there are actually 2 different tables in this csv file! I'll let Jeanette know. Screen Shot 2024-02-09 at 4 57 47 PM

angelchen7 commented 3 months ago

Opened PR

Jeanette let me know that she did not want that second table to make it into the integrated dataset so I had to do some special parsing on the file.

Opened a pull request here so that Jeanette can give me feedback.

The other issue (upgrade setup R step) has also been taken care of in that PR.

angelchen7 commented 3 months ago

Merged PR

Jeanette approved of my changes so the PR is now closed and merged back into main 🎉 yay!! I think we're done with maintaining this package for now.