Closed wilsaj closed 8 years ago
Importing this will be fairly straightforward and I would suppose that the schema will be pretty stable.
We should try to define acceptable latency ahead of time, at least in a ballpark, because the way the system should be set up depends on this parameter. Ensuring timely updates across many gauges may pose a little challenge and is limited by our polling frequency. There may also be some challenge to drive notifications reliably and with relatively low latency.
For similar reasons we should have an idea of what we want to do if the service begins to flake out from our perspective. If there are polling jobs already in a queue, is it okay to just let them fail, and maybe not generate new polling jobs for this service for a while?
Integrating with the USGS NWIS gauges data will also be a little more work, but that is for another issue.
For ballpark purposes, a 2-7 minute latency is probably the range we need to stay in. 2 is ambitious.
But turning the latency question around:
In general, it would be nice to have a way of rate-limiting our polling frequency consistently per data source since that really determines what our impact on the source services is going to be. For example: having a soft guarantee that we'll only hit NWS with 1 request per second. The NWS service is stronger than most, but some services do fall over with bursty traffic. Having a back-off mechanism would also be nice to have.
Another consideration: flooding is going to be localized. Maybe only 30 gauges are predicted to go into flood in the next 24 hours, so those will have a much higher priority with regard to polling interval.
Forecasts, I believe, are only updated every 4 hours.
For SMS notification, one of the goals is to send notifications ahead of time based on forecasts so that buys us some latency. If I received a notification an hour ago that told me that my gauge is going to flood in an hour, then I'd expect it to be flooding right about now whether or not I receive a notification right this second.
Unsurprisingly, I'm finding that polling NWS is very stable at 1 request per second (only about 6% of requests take more than 1s, most like 0.5s).
But 727 gauges at that rate will take about 12 minutes and realistically I'd reserve 15 minutes. So with this rate limit, you can expect about that much time between polls of any given gauge.
Aggressively pushing each update out as soon as we download it might get us 2 minutes from our download to visibility if all goes well. But that time is just added to the latency between change and poll, as dictated by the rate limit.
If we can predict gauge update times on a regular rhythm then we might do much better. We could also more frequently make requests for gauges that change more, or which are more important (e.g. forecast or close to flood, or near something with forecast or close to flood). The easiest way would be to either poll these flood gauge endpoints at a higher rate (2 requests/sec would get us close to 7 minutes, if it were possible) or poll some sort of dump instead of polling individual gauges.
Good to know. We should probably just talk to NWS about this and find out what they're comfortable with and if they have a better dump somewhere. There have already been some higher level conversations between TWDB / TNRIS and NWS about various flood things but I've been out of that loop. I'll try to dig up some contacts to start this conversation. cc @mpavon @rwade2126
Just found out about this... NWS has some ESRI rest services available at https://idpgis.ncep.noaa.gov/arcgis/rest/services/NWS_Observations/ahps_riv_gauges/MapServer
This basically replaces the get_map_points.php
endpoint with something a little more canonical and hopefully easier to deal with. Still need to poll individual gauges for observations and forecasts, but this provides the list of gauges.
... update on that ArcGIS service: upon inspection, we've noticed it is missing a few gauges that are available on the get_map_points.php
endpoint.
@mpavon is following up with NWS to find out what the deal is.
get_map_points is giving us 251 gauges from neighboring states. That leaves 478 from Texas. I can work up a broader report on the differences I can see between the two if you want one.
This information is being gathered and consumed so I think it's as good as done for now. Further issues with things like latency or switching to the Arc REST API can be separate issues
This issue tracks progress on implementing gauges for the National Weather Service (NWS) Advanced Hydrologic Prediction Service (AHPS)
source
curl 'http://water.weather.gov/ahps/get_map_points.php' --data 'key=tx&fcst_type=obs&percent=50¤t_type=all&populate_viewport=1&timeframe=0' —compressed
<disclaimers>
are the XML version of “for entertainment purposes only” messages. We don’t need to worry about these<sigstages>
contains the set of Flood Stages defined for the gauge. These are important, we need these to associate observed and forecast values to flood stages.<sigflows>
is the analog with stages defined by streamflow, but seem to be incorrect sometimes. Probably should ignore<zerodatum>
TL;DR is “its kind of complicated and we don’t need to worry about it for this application”. Longer version is that it defines in a semi-precise way what “10 ft” of stage means. It references a more or less consistent model of the earth, so “235.49 above Mean Sea Level” means that a stage of 0ft is 235.49 ft above MSL and a flood stage of 5ft is 240.49 ft above MSL. It gets more complicated than that because "Sea Level" is a weird construct, but I’ll stop there.<rating>
is the rating curve for the gauge. We don’t need to worry about this, because the values will be contain both Stage and Streamflow with the conversion automatically applied.<alt_rating>
an alternate version of the rating curve? Again, feel free to ignore.<observed>
Observed values that were measured by the gauge. These have already happened.<forecast>
The modeled forecast of what is predicted to happen. These will not always be available - NWS only runs forecasts when they have determined that there will be some flood risk.<observed>
and<forecast>
(when available) and stage needs to be correlated it with