datadotworld / data.world-py

Python package for data.world
https://data.world/integrations/python
Apache License 2.0
101 stars 30 forks source link

`load_dataset` fails to run due to nanosecond timestamps in API response #140

Open nachomaiz opened 1 year ago

nachomaiz commented 1 year ago

Hi,

I'm getting an error when using the load_dataset function. It seems that the API is providing datetime information with nanosecond resolution, while datetime only supports up to microsecond resolution:

Traceback (most recent call last):
  File "~\t2.py", line 3, in <module>
    dw_ds = dw.load_dataset("{owner}/{id}")  # modified for privacy
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\venv\Lib\site-packages\datadotworld\__init__.py", line 99, in load_dataset
    load_dataset(dataset_key,
  File "~\venv\Lib\site-packages\datadotworld\datadotworld.py", line 164, in load_dataset
    last_modified = datetime.strptime(dataset_info['updated'],
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\Miniconda3\envs\nox\Lib\_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\Miniconda3\envs\nox\Lib\_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '2023-03-22T17:14:38.878483744Z' does not match format '%Y-%m-%dT%H:%M:%S.%fZ'

This should work if nanoseconds are stripped from the string before parsing with datetime.

datetime.datetime.strptime("2023-03-22T17:14:38.878483Z")  # works

Let me know if I can provide any more info.

Happy to contribute a PR if you would like.

Thanks!

alexcrawley commented 11 months ago

I've also just hit this, as a workaround you can pass force_update=True to bypass the last_modified check.

nachomaiz commented 11 months ago

I've also just hit this, as a workaround you can pass force_update=True to bypass the last_modified check.

Oh, good tip! Will try that instead. Thanks @alexcrawley!

Still, I think it should still be addressed within the package if the purpose is to store cached requests.