Open ldecicco-USGS opened 4 years ago
fread
works great for WQP. Not as obviously great for RDB (those are the only 2 table readers we need). Here's the branch I'm working on:
https://github.com/ldecicco-USGS/dataRetrieval/tree/fread
The reason I'm thinking about this is that CRAN is potentially going to limit the number of total dependencies a package can have (that's the rumor at least....). readr
brings in a bunch of dependencies. data.table
stands alone (and is mostly more efficient).
Right now though, fread
doesn't have a comment.char argument....which makes getting the RDB format very difficult (since you need to both ignore the # lines and skip 2 lines).
So, I'll follow this Issue: https://github.com/Rdatatable/data.table/issues/856 because I think if that was implemented, we could make a switch.
In the meantime, we'll cross our fingers that the total dependency limit doesn't affect this (still a bit early to tell).
It turns out, data.table
ignores lines that start with "#" by default! So maybe, it's doable in the near term. In my fiddling though, it seems that if you use the default (ignore "#"), then you can't also use the "skip" with just a number (in our case, 2). If you use skip, it's starting from the beginning of the file (so, not ideal in our case, but not a deal breaker). However, it's fast enough that we could read the whole thing in as characters (which is how it seems to come in with our RDB formatting), use the top row to convert types...:
siteINFO <- readNWISsite('05114000')
url <- attr(siteINFO,"url")
colnames_data <- fread(url,
header = TRUE,data.table = FALSE,
keepLeadingZeros = TRUE)
types <- unlist(colnames_data[1,])
types <- gsub("\\d", "", types) #this is R 4.0 regex!
types <- gsub("d", "Date", types)
types <- gsub("s", "character", types)
types[grepl("_va", names(types))] <- "numeric"
ret_df <- colnames_data[-1,]
ret_df[,types == "numeric"] <- sapply(ret_df[,types == "numeric"], as.numeric)
This code chunk actually gets us pretty far.... "just" (HA!) need to figure out dates. A lot of the date logic though will still be baked in the codes, so it might not be too much more work.
Not so much that it's backlogged, but that it's a continues task that should be considered. I don't think it's worth the effort to convert to data.table
unless we do a major overhaul or readr
starts causing problems.
httr2 on the otherhand....
We chose
readr
initially because we thoughtdata.table
was a bigger learning curve. To simply convert tofread
would be pretty easy and still save some time and reduce some (a lot?) of dependencies.While this is explored, we could re-re-think the lubridate dependency.