USGS-R / drb-estuary-salinity-ml

Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

usgs data getting dropped between fetch and munge step #27

Closed galengorski closed 2 years ago

galengorski commented 2 years ago

It looks like there are some data that get dropped during the munge step of the pipeline. Looking specifically at usgs site 01463500 (Trenton) in 1_fetch/out , I see specific conductance data with a parameter flag "A" which indicates that it's approved. However in 1_munge/out/usgs_nwis_01463500.csv, there is no specific conductance data. I wonder if this is an issue with the flags maybe? I haven't systematically checked with the other sites yet, but maybe it is an issue elsewhere.

galengorski commented 2 years ago

btw, the data that I am referencing is data that I downloaded manually from s3

amsnyder commented 2 years ago

That data is getting dropped because the timestep on the data file is every 15 mins, but it looks like the conductivity data is only being measured hourly....so the script is dropping the data thinking that only 25% of the data is available for the day. I need to see if there is a way to query what the timestep of the measurements should be instead of just using the timestamps I guess.

amsnyder commented 2 years ago

If you want a quick fix, I can set the threshold so that 1% of measurements for a day are required instead of 50% of measurements. Would that be helpful? I could apply that right now and update the data.

galengorski commented 2 years ago

Ah ok, that makes sense! No worries, I can do that myself on my local to have some data to work with