Closed galengorski closed 2 years ago
btw, the data that I am referencing is data that I downloaded manually from s3
That data is getting dropped because the timestep on the data file is every 15 mins, but it looks like the conductivity data is only being measured hourly....so the script is dropping the data thinking that only 25% of the data is available for the day. I need to see if there is a way to query what the timestep of the measurements should be instead of just using the timestamps I guess.
If you want a quick fix, I can set the threshold so that 1% of measurements for a day are required instead of 50% of measurements. Would that be helpful? I could apply that right now and update the data.
Ah ok, that makes sense! No worries, I can do that myself on my local to have some data to work with
It looks like there are some data that get dropped during the munge step of the pipeline. Looking specifically at usgs site 01463500 (Trenton) in 1_fetch/out , I see specific conductance data with a parameter flag "A" which indicates that it's approved. However in 1_munge/out/usgs_nwis_01463500.csv, there is no specific conductance data. I wonder if this is an issue with the flags maybe? I haven't systematically checked with the other sites yet, but maybe it is an issue elsewhere.