agronholm / cbor2

Python CBOR (de)serializer with extensive tag support
MIT License
215 stars 57 forks source link

Encoder includes seconds in timezone when encoding #161

Open rudigiesler opened 1 year ago

rudigiesler commented 1 year ago

This was picked up from the hypothesis testing.

RFC3339 section 5.6 defines the timezone to consist of just hours and minutes, not seconds.

Python timezones support seconds, and when converting to a string using isoformat, includes the seconds.

This is an issue if you try to decode it again, as you'll get an error that it is an invalid datetime string:

>>> from datetime import datetime
>>> dt = datetime.fromisoformat("1912-01-01T00:00:00-00:16:08")
datetime.datetime(1912, 1, 1, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=85432)))
>>> from cbor2 import encoder, decoder
>>> encoder.dumps(dt)
b'\xc0x\x1c1912-01-01T00:00:00-00:16:08'
>>> decoder.loads(b'\xc0x\x1c1912-01-01T00:00:00-00:16:08')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rudi/pk/cbor2/cbor2/decoder.py", line 668, in loads
    return CBORDecoder(fp, **kwargs).decode()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rudi/pk/cbor2/cbor2/decoder.py", line 198, in decode
    return self._decode()
           ^^^^^^^^^^^^^^
  File "/Users/rudi/pk/cbor2/cbor2/decoder.py", line 185, in _decode
    return decoder(self, subtype)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rudi/pk/cbor2/cbor2/decoder.py", line 379, in decode_semantic
    return semantic_decoder(self)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rudi/pk/cbor2/cbor2/decoder.py", line 445, in decode_datetime_string
    raise CBORDecodeValueError(f"invalid datetime string: {value!r}")
cbor2.types.CBORDecodeValueError: invalid datetime string: '1912-01-01T00:00:00-00:16:08'
agronholm commented 1 year ago

I guess we should not allow serialization of such datetimes at all then?

agronholm commented 1 year ago

I don't think this problem would ever manifest in real life though, as all time zones have UTC offsets in 30 minute increments.

rudigiesler commented 1 year ago

I think raising an exception when trying to serialize such datetimes is a good solution. Silently truncating data on serialization doesn't seem like a good solution, letting the user know that we can't serialise it, and having them truncate it before passing to us, sounds correct to me.