Open dwr-psandhu opened 1 month ago
I glanced through it. It seems straight forward enough and should scale.
What if I ask you to conceptually think of the raw download the same way ?
For example if I do a download cimis and drop in the dropbox or any of the existing noaa, usgs sources it should work the same way
What do you think?
@esatel responded to the above
Well, you could certainly drop one of those files and name "read_ts" as the reader. This would be very helpful, since read_ts is aware of a lot of quirks. We could abstract out the "reformat" stuff as separate functions ... would take some effort, but it is a good move.
One big difference is that for most of the big agencies and programs it is "worth it" to work through all the quirks because you get 1000 time series for your effot. Maybe that would be true for CIMIS I'm not sure. You've requested instructions on how to add another data vendor and I think that would be a good thing to look at – now that I know what things might be tweaked, I've been meaning to refactor a few things to expose what is needed to make it extensible.
The initial drop box idea was focused on the case where it is not worth it ... maybe Dave Huston gave us a couple extra series or something like that. So I wasn't planning to provide code to extract metadata or thinking lots of similar data coming in.
There are some cases in between though. For instance the very old USGS data from the Aquarius system still, I believe, does not come across in NWIS. It could be that we can offer EITHER a function to reformat or some explicit metadata and it would handle both the "institutional" and "one off" cases.
The easiest way to approach this is probably to orthogonalize the two so we don't have to change the raw approach right away.
Dropbox
Proposal 1: Modeling Data subdirectories are a Drop Box
In this proposal, you can put things anywhere in Modeling_Data as long as: • you can point to a reader that reads data, applies provider flags the way you want, transforms it into a dataframe . • filenames sort lexicographically, • you need to make a small entry in recpies/data_recipes.yaml describing how to read it and a few pieces of metadata. • checker will be provided. Nightly they will be swept into /formatted and thereafter they are safe, although whether the raw is safe or not is kind of up to users.
Use cases:
The proposal is that these can be put in /dropbox/data but also anywhere on Modeling_Data Modeling_Data
The crux is data_recipes.yaml, the purpose of which is to do the following: