dazza-codes / aio-aws

Asyncio utils for AWS Services
Apache License 2.0
3 stars 1 forks source link

jobs-db fails to select latest submitted job after prior failed job #56

Closed dazza-codes closed 3 years ago

dazza-codes commented 3 years ago

# job that failed
>>> job_failed.submitted
1632457785
>>> job_failed.created
1632457785097
>>> job_failed.created or job_failed.submitted
1632457785097

# subsequent job submission
>>> job_submitted.created or job_submitted.submitted
1632464550
>>> job_submitted.created is None
True
>>> job_submitted.submitted
1632464550

The submitted timestamp seems to be a smaller int than a created timestamp, but the desired result is to get the submitted job (which is submitted after the failure and it doesn't yet have a created timestamp):

>>> max([1632464550, 1632457785097])
1632457785097

# if the last 3 digits of the created timestamp are deleted, the submitted timestamp is greater
>>> max([1632464550, 1632457785])
1632464550

It's weird that a createdAt value seems to contain too many digits for a unix timestamp.

>>> datetime.utcfromtimestamp(1632457785097)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: year 53700 is out of range

# if the last 3 digits are truncated, it works as expected
>>> datetime.utcfromtimestamp(1632457785)
datetime.datetime(2021, 9, 24, 4, 29, 45)

# the submitted job is later (approx 2 hrs)
>>> datetime.utcfromtimestamp(1632464550)
datetime.datetime(2021, 9, 24, 6, 22, 30)

The AWS docs have an example at https://docs.aws.amazon.com/cli/latest/reference/batch/describe-jobs.html where the createdAt value seems to be too large, i.e. 1480483387803. Ah, it's in milliseconds!

>>> datetime.utcfromtimestamp(1480483387803)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: year 48884 is out of range

>>> datetime.utcfromtimestamp(1632457785097/1e3)
datetime.datetime(2021, 9, 24, 4, 29, 45, 97000)

>>> datetime.utcfromtimestamp(1632457785097/1e3) < datetime.utcfromtimestamp(1632464550)
True

The solution to this issue might be to create the submitted timestamps in milliseconds also.

>>> from email.utils import parsedate_to_datetime
>>> http_date = "Mon, 23 Mar 2020 15:29:33 GMT"
>>> parsedate_to_datetime(http_date)
datetime.datetime(2020, 3, 23, 15, 29, 33, tzinfo=datetime.timezone.utc)
>>> parsedate_to_datetime(http_date).timestamp()
1584977373.0  # seconds
>>> parsedate_to_datetime(http_date).timestamp() * 1e3
1584977373000.0  # milliseconds
>>> from math import floor
>>> floor(parsedate_to_datetime(http_date).timestamp() * 1e3)
1584977373000

Have to be careful about tz aware results


# from milliseconds to datetime, with tz aware results
>>> from datetime import datetime
>>> from datetime import timezone

>>> datetime.utcfromtimestamp(1632457785097/1e3)
datetime.datetime(2021, 9, 24, 4, 29, 45, 97000)  # not tz aware!
>>> datetime.utcfromtimestamp(1632457785097/1e3).replace(tzinfo=timezone.utc)
datetime.datetime(2021, 9, 24, 4, 29, 45, 97000, tzinfo=datetime.timezone.utc)