USACE / cwms-data-api

Corps Water Management System RESTful Data Service
MIT License
13 stars 14 forks source link

Deprecate the "timezone" query parameter for v2 and newer end-points #570

Open rma-rripken opened 8 months ago

rma-rripken commented 8 months ago

As described in https://github.com/USACE/cwms-data-api/issues/556#issuecomment-1984710644 the "timezone" parameter is a little bit weird when used in JSONv2 and XMLv2. It basically only used as a way for the user to specify a default timezone to be used if the user didn't include timezone info in their "begin" and "end" strings.

It makes it so the user could specify begin = 2022-01-06T00:00:00 end = 2022-01-06T12:00:00 timezone = UTC or begin = 2022-01-06T00:00:00 end = 2022-01-06T12:00:00 timezone = America/Los_Angeles

I think the V2 end-points sort of inherited this behavior. But its too helpful.

The V1 end-points also passed timezone into the pl/sql and the pl/sql promised to return values in the specified timezone.

The V2 end-points return data in UTC. Maybe V2 isn't always UTC but I can't think of when it isn't and if it isn't we need to document that better.

Instead of continuing to "inherit" the old behavior (in my opinion) we should force the clients to give us better inputs (timestamps with tz) and throw errors if the inputs seem weird.

DanielTOsborne commented 7 months ago

Downside of this is that on tomcat by default, the square brackets [] are not allowed in the URL query string. Also, as I learned after I wrote that, it's non-trivial to format/parse that time format in Javascript.

My recommendation would be to only allow naïve datetimes if the timezone parameter is used, and throw an error if the timezone is specified in both places. Additionally, enforce the start and end times to be the same format, throw an error if not. In practice, that's how I ended up using it anyway.

I made it lax in what was allowed, but that wasn't the best idea. It should be strict, but still flexible for the user where it makes sense (and allowing tz ambiguity does not make sense).

krowvin commented 7 months ago

Would they just need to be uri encoded?

I. E. %5B...%5D

Unless you're saying tomcat won't handle that either?

Edit: agreed, I too expected time zone to be sufficient to prevent end users having to deal with it.

The browser will return the time zone string no problem.

DanielTOsborne commented 7 months ago

Would they just need to be uri encoded?

I. E. %5B...%5D

Unless you're saying tomcat won't handle that either?

Edit: agreed, I too expected time zone to be sufficient to prevent end users having to deal with it.

The browser will return the time zone string no problem.

I believe I tried that before, and that didn't work either. I don't remember if it was blocked the same way, or the servlet just didn't know how to parse it though.

MikeNeilson commented 7 months ago

I think it just didn't parse correctly. We can modify the tomcat config fairly easily (there's nothing in the STIG, Tomcat just went with a default from an RFC, that no one else follows). Browsers don't encode the [ and ] like they do other characters.

I think that original RFC was just making an ultimately bad assumption.

But I'm not opposed to @DanielTOsborne 's proposed solution, allow one OR the other, but not both, and we tweak the output to not use it. Always though the -0700, for example, format was better anyways. Far more clear.

rma-rripken commented 7 months ago

Always though the -0700, for example, format was better anyways. Far more clear.

I'm not entirely sure how to interpret this - more clear than what? Are you saying you like the formats that include the offset and the name better than formats that just include the name?
Like 2021-06-10T13:00:00-0700[PST8PDT] better than 2021-01-01T00:00:00[PST8PDT] ? Or you'd rather see just the offset and don't care about the name? Like: 2021-01-01T00:00:00-07:00 Or offset format that doesn't include a colon? Something like: 2021-01-01T00:00:00-0700 ?

I just tried some queries and I had no problem with dates with square brackets in query params when they are encoded as %5B...%5D as suggested. The documentation for the timeseries end-point includes an example of a format that includes brackets. '2021-06-10T13:00:00-0700[PST8PDT]' I did not try using dates as path parameters but CDA isn't doing that anywhere.

We could throw an error if timezone is provided and the datetime string already includes a timezone piece - that means trying to determine whether the input string includes timezone and/or offset parts or not. Making sure the provided timezone is valid. Then figuring out what error to send the users and documenting the behavior and trying to explain that they can put the timezone in two different places but not both. And what if the end-point takes multiple datetimes? Can some include it and other not? Its technically doable

I suggest again just removing the timezone parameters. Simplify implementation, reduce surface area. If necessary, we instead give the user documentation with clear examples of the input datetime formats. We could probably find/figure/guess the most common timezone in each district and make sure we have an example of exactly how they can format times for their locations.