adobe / json-formula

Query language for JSON documents
http://opensource.adobe.com/json-formula/
Apache License 2.0
19 stars 8 forks source link

Potential date/time issues #117

Closed Eswcvlad closed 7 months ago

Eswcvlad commented 8 months ago

This one is a bit more "ranty", subjective and less structured, so I apologize in advance :)

General on Date/Time

  1. Personally, I had some difficulty understanding the "The date/time value is offset from the current time zone to UTC" line till I looked at the tests. As I understood this means that date/time values always represent date/time in UTC, while function arguments and output of date/time query functions (like hour) always represent local time zone. But, imo, it is pretty hard to get that from spec alone.
  2. This is more of my own fault, but with the date/time representation used, I assumed, that you would be able to use date/time query functions on durations and date/time differences. But this is the case only if your local time zone is UTC, everywhere else the local time zone shift will give you an incorrect result, so you need to use datedif. Maybe it would be worth to add an explicit note on this to prevent misuse?
  3. As of now you can only query date/time entities in local time zone, which might be undesirable in some use cases (you might just want to use a specific timezone regardless of current device location). From my point of view it would be a good idea to have an optional timezone argument, where local time zone implication exists.
  4. There is no convenient way to specify a date/time delta besides doing math on predefined dates or creating a delta value manually. I think you might want to have something like now() - time_in_form > delta_of_fifteen_hours.

datedif

  1. In the implementation "unit" parameter is case-insensitive, which is not mentioned in the spec. Would expect a TypeError for capitalized units in such case.
  2. "md" value is not implemented as of now.
  3. I haven't tested it, but I have a feeling that this line has issues for February 29th.
  4. I am guessing this comes from the OpenFormula spec, but it seems strange to me, that it matters, that start_date <= end_date. Might just allow it anyway and return a negative value, as it might be hard to know, which date will be bigger. And wrapping it into if will lead to annoying duplication.

datetime and time

  1. As mentioned before, I think the ability to specify a time zone or offset for this could be useful.
  2. "Values from 0 to 99 map to the years 1900 to 1999.": for me it seems like a bad idea to leave this in a modern standard, as it has caused enough pain already... I doubt people will use 0 - 99 AD dates, but if you know, you have YY years, you could just add 1900 yourself in the expression.
  3. There is a mention on argument overflow in datetime, but it is not specified what happens, if arguments are negative or month/day is 0. Personally, I would expect it to be a TypeError.
  4. The same overflow rules work for time in the implementation, but this is not mentioned in the spec. It also works somewhat surprisingly, if there is an overflow in the hour, as datetime(1970, 1, 2, 1, 0, 0) == time(25, 0, 0)
  5. time can return a negative value, if local timezone is "east" of UTC.
  6. Overall, it is hard for me to find a use case for time. As of now, time(h, m, s) is equivalent to datetime(1970, 1, 1, h, m, s), which, as a date, has limited use. You cannot add time to some other datetime, as you will have an unexpected result of the time zone offset being applied twice. You cannot use it for delta comparisons either because of this. At the same time I can totally see a use case, if it returned a time delta without any offsets instead, maybe with days/months/years as well.

Date/Time query functions

By these I mean year, month, day, hour, minute, second and weekday.

  1. As mentioned before, I think the ability to specify a time zone or offset for this could be useful.
  2. There is no millisecond function. Since you can specify milliseconds in datetime, might be a good idea to add this for consistency.
  3. hour, minute and second return null for negative numbers in the implementation, and this is mentioned in tests. But this case occurs naturally in, for example, time(1, 30, 0) | [hour(@), minute(@), second(@)] with a local UTC+3 time zone, as time will return a negative value because of the offset.
  4. weekday examples are missing from the docSamples.json tests,
  5. In the implementation, weekday returns null, if returnType is not one of 1, 2, 3. Personally, I would expect a TypeError here. In the same boat I would expect something like 1.5 treated as 1, instead of returning null, especially with JS, where there are, theoretically speaking, no integer types. In general, coercion to integer works differently in different functions and is not covered in spec, want to do a recap on that later.

now, today and eomonth

  1. As mentioned before, I think the ability to specify a time zone or offset for this could be useful.

toDate

  1. There is a typo in the examples: toDate("toDate("2023-11-10T13:00:00+04:00")") // returns 19671.375
  2. This is a bit pedantic, but maybe it would be worthwhile to limit the ISO 8601 applicability, as it also includes stuff like week and ordinal dates, which, I would assume, majority of implementations won't implement. Usually people just implement RFC 3339 string support and call it ISO 8601. :)

That's all I've noticed for now. For the most part, these are relatively minor nitpicks, but, imo, the time zone and time delta stuff could become annoying in practice.

JohnBrinkman commented 8 months ago
  1. Personally, I had some difficulty understanding the "The date/time value is offset from the current time zone to UTC" line till I looked at the tests. As I understood this means that date/time values always represent date/time in UTC, while function arguments and output of date/time query functions (like hour) always represent local time zone. But, imo, it is pretty hard to get that from spec alone.

Hmm. I thought it was well specified here: https://opensource.adobe.com/json-formula/doc/output/json-formula-specification-1.0.2.html#_date_and_time_values

But any suggestions to make it more clear are welcome.

  1. This is more of my own fault, but with the date/time representation used, I assumed, that you would be able to use date/time query functions on durations and date/time differences. But this is the case only if your local time zone is UTC, everywhere else the local time zone shift will give you an incorrect result, so you need to use datedif. Maybe it would be worth to add an explicit note on this to prevent misuse?

Could you expand on this a bit more with a specific use-case? How did you get a duration value? Were you subtracting date/time values?

  1. As of now you can only query date/time entities in local time zone, which might be undesirable in some use cases (you might just want to use a specific timezone regardless of current device location). From my point of view it would be a good idea to have an optional timezone argument, where local time zone implication exists.

Hmm. For starters, we do have very limited timezone support in the toDate() function. e.g. you can parse an ISO8601 date with a timezone offset: toDate("20231110T130000+04:00")

Note that:

But it would be interesting to explore if there are ways in which we could add limited timezone support. Could be any of:

How would you see time zone support affecting the spec?

  1. There is no convenient way to specify a date/time delta besides doing math on predefined dates or creating a delta value manually. I think you might want to have something like now() - time_in_form > delta_of_fifteen_hours.

Again, it would help if you could expand the use-case. Not clear to me what you're trying to accomplish

datedif

  1. In the implementation "unit" parameter is case-insensitive, which is not mentioned in the spec. Would expect a TypeError for capitalized units in such case.

I will update the spec

  1. "md" value is not implemented as of now.

Good catch

  1. I haven't tested it, but I have a feeling that this line has issues for February 29th.

I did a quick test:

datedif(datetime(2023,2,1), datetime(2023,3,1), "yd")

returns 28

datedif(datetime(2024,2,1), datetime(2024,3,1), "yd")

returns 29

datedif(datetime(2023,2,1), datetime(2024,3,1), "yd")

returns 28

None of these seem wrong -- and the results are consistent with Excel...

  1. I am guessing this comes from the OpenFormula spec, but it seems strange to me, that it matters, that start_date <= end_date. Might just allow it anyway and return a negative value, as it might be hard to know, which date will be bigger. And wrapping it into if will lead to annoying duplication.

You're right, OpenFormula and Excel expect start/end to be ordered. I am reluctant to introduce a negative result -- not sure if there'd be other consequences. But note that you don't need to use "if". You could use something like:

[toDate(date1), toDate(date2)].datedif(min(@), max(@), "YD")

or if you want a negative result when end date is earlier:

[toDate(date1), toDate(date2)].{d: datedif(min(@), max(@), "YD"), s: sign(@[1] - @[0])} | d * s

Still clunky -- but at least we didn't use "if" :-)

datetime and time

  1. "Values from 0 to 99 map to the years 1900 to 1999.": for me it seems like a bad idea to leave this in a modern standard, as it has caused enough pain already... I doubt people will use 0 - 99 AD dates, but if you know, you have YY years, you could just add 1900 yourself in the expression.

Hmm. In hindsight, the ICU rule seems better: If we have a 2-digit year, then adjust the date to be within 80 years before and 20 years after the current time.

  1. There is a mention on argument overflow in datetime, but it is not specified what happens, if arguments are negative or month/day is 0. Personally, I would expect it to be a TypeError.

The implementation will subtract zero and negative values from the date parts. I find this useful, as it makes it easy to add/subtract days from a date. e.g. if we had data for year, month and day, then we could use this expression:

e.g. datetime(year, month, day - 30) will return a date 30 days earlier.

I will update the spec to match.

  1. The same overflow rules work for time in the implementation, but this is not mentioned in the spec. It also works somewhat surprisingly, if there is an overflow in the hour, as datetime(1970, 1, 2, 1, 0, 0) == time(25, 0, 0)

I'll update spec for time(). Looking more closely, it appears the implementation for time() does not allow negative parameters. It probably should for consistency.

I'm not sure why your example is surprising? What outcome would you expect?

  1. time can return a negative value, if local timezone is "east" of UTC.

Will this cause problems?

  1. Overall, it is hard for me to find a use case for time. As of now, time(h, m, s) is equivalent to datetime(1970, 1, 1, h, m, s), which, as a date, has limited use. You cannot add time to some other datetime, as you will have an unexpected result of the time zone offset being applied twice. You cannot use it for delta comparisons either because of this. At the same time I can totally see a use case, if it returned a time delta without any offsets instead, maybe with days/months/years as well.

Right. Now that you describe it this way, I see the point. If we computed a time value as: (hours + minutes/60 + seconds/3600)/24, would it work in the way you expect?

  1. There is no millisecond function. Since you can specify milliseconds in datetime, might be a good idea to add this for consistency.

Noted

  1. hour, minute and second return null for negative numbers in the implementation, and this is mentioned in tests. But this case occurs naturally in, for example, time(1, 30, 0) | [hour(@), minute(@), second(@)] with a local UTC+3 time zone, as time will return a negative value because of the offset.

Noted

  1. weekday examples are missing from the docSamples.json tests,

Noted

  1. In the implementation, weekday returns null, if returnType is not one of 1, 2, 3. Personally, I would expect a TypeError here. In the same boat I would expect something like 1.5 treated as 1, instead of returning null, especially with JS, where there are, theoretically speaking, no integer types. In general, coercion to integer works differently in different functions and is not covered in spec, want to do a recap on that later.

True. The general rule is that bad parameters should throw exceptions.

  1. There is a typo in the examples: toDate("toDate("2023-11-10T13:00:00+04:00")") // returns 19671.375

Noted

  1. This is a bit pedantic, but maybe it would be worthwhile to limit the ISO 8601 applicability, as it also includes stuff like week and ordinal dates, which, I would assume, majority of implementations won't implement. Usually people just implement RFC 3339 string support and call it ISO 8601. :)

Valid point

Eswcvlad commented 8 months ago
  1. Personally, I had some difficulty understanding the "The date/time value is offset from the current time zone to UTC" line till I looked at the tests. As I understood this means that date/time values always represent date/time in UTC, while function arguments and output of date/time query functions (like hour) always represent local time zone. But, imo, it is pretty hard to get that from spec alone.

Hmm. I thought it was well specified here: https://opensource.adobe.com/json-formula/doc/output/json-formula-specification-1.0.2.html#_date_and_time_values

But any suggestions to make it more clear are welcome.

Well in my case it wasn't immediately obvious, that datetime and time arguments are in local time zone, since them being UTC also made sense to me. :)

  1. There is no convenient way to specify a date/time delta besides doing math on predefined dates or creating a delta value manually. I think you might want to have something like now() - time_in_form > delta_of_fifteen_hours.

Again, it would help if you could expand the use-case. Not clear to me what you're trying to accomplish

Well I haven't though of any specific use-cases...

Like, abstracting away from PDF, you might have request logs in a JSON format with start and end timestamps and you want to leave only those, which took more than 10 second to process. So you would need to write something like data[?(toDate(end) - toDate(start) > 10 / 60 / 60 / 24)], which looks a bit ugly and requires you to keep the date/time storage format in mind.

In other languages you could be more descriptive without needed the reader to know, how a date/time difference is represented. Like, in C++ you would have end - start > seconds(10), in Python end - start > timedelta(seconds=10), etc.

You could have something similar in a PDF form. For example, show some sort of warning, if a form should have be filled and sent back within 24 hours and you are late.

  1. This is more of my own fault, but with the date/time representation used, I assumed, that you would be able to use date/time query functions on durations and date/time differences. But this is the case only if your local time zone is UTC, everywhere else the local time zone shift will give you an incorrect result, so you need to use datedif. Maybe it would be worth to add an explicit note on this to prevent misuse?

Could you expand on this a bit more with a specific use-case? How did you get a duration value? Were you subtracting date/time values?

Similar to the request log example above, you might want to format the duration in a HH:MM:SS format. If you try to use hour/minute/second functions for that, which is not that far reaching, you would get an incorrect result. It would still be incorrect, if they worked on deltas, as we don't have a way to pad values with zeros, but nvm. :)

Could be the same situation for a PDF form on submit.

  1. As of now you can only query date/time entities in local time zone, which might be undesirable in some use cases (you might just want to use a specific timezone regardless of current device location). From my point of view it would be a good idea to have an optional timezone argument, where local time zone implication exists.

Hmm. For starters, we do have very limited timezone support in the toDate() function. e.g. you can parse an ISO8601 date with a timezone offset: toDate("20231110T130000+04:00")

Well, yeah, but you won't be able to create such a string from values within json-formula consistently, as, iirc, there are no string padding or date/time formatting functions.

Note that:

* OpenFormula/Excel have no timezone support

* If you're using json-formula from within Forms.Next, it has extensive timezone support

But it would be interesting to explore if there are ways in which we could add limited timezone support. Could be any of:

* Add a timezone offset to datetime()

* Set the default timezone for a session (maybe in the setup API?)

* Add a timezone offset parameter to toString() to output an ISO8601 value with UTC offset

How would you see time zone support affecting the spec?

When I was thinking of the time zones in this context, I was looking at it from the perspective of a web back-end developer, which has been conditioned to work with anything time related only in UTC. :)

There is also the fact, that local time zone is, more often than not, is not just an offset from UTC, but an ID like Europe/Paris, which is a subject to DST. So when you construct a datetime in past or future, it might have a different UTC offset from now, which might be confusing. I could see this as a thing, that might be different between implementations (i.e. one looks at the current UTC offset for the local time zone, while a different one uses the ID).

If it were up to me, I would just add an optional tz parameter everywhere with the default time zone specified in the in the interpreter. Might as well support both time zone ID strings and numeric offsets from UTC for tz. So that all the added complexity is opt-in.

  1. I haven't tested it, but I have a feeling that this line has issues for February 29th.

I did a quick test:

datedif(datetime(2023,2,1), datetime(2023,3,1), "yd")

returns 28

datedif(datetime(2024,2,1), datetime(2024,3,1), "yd")

returns 29

datedif(datetime(2023,2,1), datetime(2024,3,1), "yd")

returns 28

No, I meant a different case. When you have date2 represent, for example, 2024-02-29 and for the calculation you set year to a non-leap year. Not sure, what the expected result would be...

  1. There is a mention on argument overflow in datetime, but it is not specified what happens, if arguments are negative or month/day is 0. Personally, I would expect it to be a TypeError.

The implementation will subtract zero and negative values from the date parts. I find this useful, as it makes it easy to add/subtract days from a date. e.g. if we had data for year, month and day, then we could use this expression:

e.g. datetime(year, month, day - 30) will return a date 30 days earlier.

I will update the spec to match.

Makes sense.

  1. The same overflow rules work for time in the implementation, but this is not mentioned in the spec. It also works somewhat surprisingly, if there is an overflow in the hour, as datetime(1970, 1, 2, 1, 0, 0) == time(25, 0, 0)

I'll update spec for time(). Looking more closely, it appears the implementation for time() does not allow negative parameters. It probably should for consistency.

I'm not sure why your example is surprising? What outcome would you expect?

I kind of expected it to not have any "date" component, so to speak. I.e. like a wall clock wrapping around.

  1. time can return a negative value, if local timezone is "east" of UTC.

Will this cause problems?

In the implementation hour/minute/second return null in such cases.

  1. Overall, it is hard for me to find a use case for time. As of now, time(h, m, s) is equivalent to datetime(1970, 1, 1, h, m, s), which, as a date, has limited use. You cannot add time to some other datetime, as you will have an unexpected result of the time zone offset being applied twice. You cannot use it for delta comparisons either because of this. At the same time I can totally see a use case, if it returned a time delta without any offsets instead, maybe with days/months/years as well.

Right. Now that you describe it this way, I see the point. If we computed a time value as: (hours + minutes/60 + seconds/3600)/24, would it work in the way you expect?

I wouldn't say this would be what I expect per-se, but I could see a use for it as...

I was trying to think of a use case, where you would want time in it's current state, and the only thing I could think of would be, in a way, storing wall clock time with timezone attached (like working hours of a company or something). But there is time zone information in Forms.next, which make the "delta" version with an explicit time zone better anyway. :)

JohnBrinkman commented 8 months ago

@Eswcvlad : From this issue, I've addressed the parts that were easy to resolve. There are parts I haven't addressed:

If you want to follow up on these, I suggest logging them as separate issues, and closing this issue.

Eswcvlad commented 7 months ago
* Time zone support

https://github.com/adobe/json-formula/issues/128

* Potential Feb 29 issues

Found a test case: https://github.com/adobe/json-formula/issues/127

* Definition of time values

I guess this is not that important. If we go forward with time zone support, then you could hack it with "UTC" anyway to work as a time delta.