IdentityPython / pysaml2

Python implementation of SAML2
Apache License 2.0
561 stars 421 forks source link

Does not handle valid time properly #445

Open talebbits opened 7 years ago

talebbits commented 7 years ago

ValueError: time data '2017-08-29T09:16:45.0631274-05:00' does not match format '%Y-%m-%dT%H:%M:%SZ'

https://github.com/rohe/pysaml2/blob/fd7a4f694b137a92f2a8b7f502d51fc21e3528c7/src/saml2/time_util.py#L19

jkakavas commented 7 years ago

What fails, how and under which circumstances ?

talebbits commented 7 years ago

when timestr = '2017-08-29T09:16:45.0631274-05:00' this function does no handle this format https://github.com/rohe/pysaml2/blob/fd7a4f694b137a92f2a8b7f502d51fc21e3528c7/src/saml2/time_util.py#L232

jkakavas commented 7 years ago

Where does this timestr comes from ?Is it an IssueInstant, a NotBeforeOrAfter ? Can we get the stacktrace ?

I think this is a valid bug and we probably need to come up with a different regexp to validate dateTime even if the one we use now matches the most commonly used format

talebbits commented 7 years ago

It comes as <samlp:Response and it is in every datetime attribute including IssueInstant

peppelinux commented 6 years ago

I think that this could be helpfull

import datetime
import dateutil.parser

def getDateTimeFromISO8601String(s):
    d = dateutil.parser.parse(s)
    return d

s = '2017-08-29T09:16:45.0631274-05:00'
getDateTimeFromISO8601String(s)

datetime.datetime(2017, 8, 29, 9, 16, 45, 63127, tzinfo=tzoffset(None, -18000))

Tested on Python 2.7.12. It also worked with the following combinations

s = '2017-08-29T09:16:45.0631274-05:00'
s = '2017-08-29T09:16:45.0631274'
s = '2017-08-29T09:16:45.0631'
s = '2017-08-29T09'
s = '2017-08-29 03:01:05'

# italian format
s = '29/08/2017 03:01:05'

# also if string seems truncated
 s = '2017-08-29T'

In other words we could subsitute the actual str_to_time function with dateutil.parser.parse or, more in the deep, refactor the entire time_util.py file with standard python libraries.

c00kiemon5ter commented 6 years ago

Looking at the SAML2 xsd, I see that

<attribute name="IssueInstant" type="dateTime" use="required"/>

IssueInstant is of type xsd:dateTime. The XML Schema datatypes spec mandates

The ·lexical space· of dateTime consists of finite-length sequences of characters of the form: '-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?

This means that 2017-08-29T09:16:45.0631274-05:00 is a valid value which makes this issue a proper bug report, and it should be fixed 👍

c00kiemon5ter commented 6 years ago

According to the dateTime datatype spec, it is mandatory to include the year, the month, the day, the hour, the minutes and the seconds. One can omit only the second fragments and the timezone info. This means that some of the results that dateutil.parser.parse returns (as proposed by @peppelinux) are invalid.

I feel it is OK to relax this rule and accept datetime values such as 29/08/2017 03:01:05 and 2017-08-29T which would otherwise be invalid by the spec. If people think this should be strict we can do that by fixing the TIME_FORMAT_WITH_FRAGMENT regex and make consistent use of that.

I also agree that time_utils.py should be refactored to use standard built-in methods to do time calculations.

c00kiemon5ter commented 6 years ago

I went through the code and it seems pysaml2 uses many different libs:

Additionally, not all time-related operations are handled through time_utils.py.

I also went through the XML Schema spec about dateTime datatype and I feel very weird reading that

The ·value space· of dateTime is closely related to the dates and times described in ISO 8601

meaning that the dateTime datatype does not represent/support the full ISO 8601 specification, and at the same time there is no clear text about the differences. I cannot understand why someone found it better to semi-specify a new format instead of taking advantage of a specification about the exact thing they wanted to specify in the first place 😞.

Regardless, I started looking for libraries with ISO 8601 support. To my surprise, there is not much support for a well-defined standard, but there are lots of libs. I looked at the python modules {time, datetime, calendar}, and libraries: pytz, arrow, moment, iso8601, isodatetime, ciso8601, numpy/datetime64, pandas/timeseries and finally aniso8601.

I think that aniso8601 stands out as the most complete implementation that is taking edge cases seriously. Additionally, it is a pure python implementation and works with the standard datetime python type. I think a refactoring should be based on that lib.

peppelinux commented 6 years ago

@c00kiemon5ter you're right, I didn't want to introduce invalid datetime lenght but only show dateutils usage flexibility. Nothing more. I agree that iso format should be mandatory.

aniso8601 looks good.

I found also: pyiso8601 and iso8601utils It seems also that starting from Python 3.7 there's a native support with colon delimiters in UTC offsets. It also come with datetime.fromisoformat function.

c00kiemon5ter commented 6 years ago

Coming back to this.

The SAML2 core specification specifies:

1.3.3 Time Values

All SAML time values have the type xs:dateTime, which is built in to the W3C XML Schema Datatypes specification [Schema2], and MUST be expressed in UTC form, with no time zone component. SAML system entities SHOULD NOT rely on time resolution finer than milliseconds. Implementations MUST NOT generate time instants that specify leap seconds.

This conflicts with the dateTime datatype specification as it disallows the timezone component and considers all dates to have been specified as UTC beforehand. The dateTime specification defines two timelines - one for timezoned dateTimes and one for untimezoned dateTimes. The SAML2 spec essentially dictates that there is only one timeline, always timezoned as UTC.

This makes things easier for the implementers, as for example, they need not consider how to compare dateTimes from different timelines (a timezoned dateTime with an untimezoned datetime.)

As such, the originally posted date 2017-08-29T09:16:45.0631274-05:00 is invalid, as it contains timezone info -05:00. The correct representation is 2017-08-29T14:16:45.063127 (converted to UTC/Z/±00:00 and timezone info removed.)

The question now becomes whether implementations actually hold that promise, and if not, how do we handle it..

peppelinux commented 6 years ago

Personally I would like to implement the most flexible and smart way to handle time, with or without timezone, with the possibility to succesfully handle exceptions on theri occurrence.

It's more like a hopefull whish, not a real TODO

c00kiemon5ter commented 6 years ago

This was further discussed in the mailing list and ..it seems everything is backwards. This is what the technical committee had to say:

Subject: Re: [saml-dev] dateTime "time zone component" Date: Wed, 2 Oct 2013 15:53:00 +0000

The SAML spec has this part about "MUST be expressed in UTC form, with no time zone component." which would literally support the format given above (I think; xmlschema dateTime itself would certainly allow for that) but as Ian (and Olav, I think) point out in this thread

https://groups.google.com/forum/#!topic/simplesamlphp/7m07m9gLzzA

the spec probably intended to say no timezone /offset/?

I would think so.

-- Scott

which means that the timezone information should be there, but it is only allowed to be 'Z'.


The plan is:

All dates are normalised to UTC and can be compared and worked-with under that assumption. Dates we produce should be in ISO-8601 format always in UTC with timezone info present as 'Z'.