Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.16k stars 1.08k forks source link

Data download error: time data does not match format #523

Open junkoda opened 10 months ago

junkoda commented 10 months ago

Sorry for reopening this issue, but I encountered this issue again, now with non-English locale.

https://github.com/Kaggle/kaggle-api/issues/484

The previous fix handled AM/PM in en_US locale, but still assumed that locale is English. At that time, I was not sure if python locale can be non-English, but indeed it is possible.

The error

$ kaggle competitions download -c blood-vessel-segmentation
time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

kaggle==1.5.16 Ubuntu 22.04.3 LTS Python 3.11.5 (miniconda)

$ echo $LC_TIME
ja_JP.UTF-8

Easy fix on user side is,

export LC_ALL=C

Reproduce

from datetime import datetime
import locale

print(locale.getlocale(locale.LC_TIME))
print(datetime.now().strftime('%a, %d %b %Y %H:%M:%S %Z'))

s = 'Fri, 03 Nov 2023 22:06:42 GMT'
t = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %Z')

Running this python script as locale_error.py gives:

$ python3 locale_error.py
('ja_JP', 'UTF-8')
木, 07 12月 2023 14:23:15
Traceback ...
ValueError: time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

%a and %b in strptime() is locale dependent. Locale of my new computer is Japanese.

This could be reproduce with:

$ LC_TIME=ja_JP.utf8 python3 locale_error.py

if that locale is available in your OS:

$ locale -a
C.utf8
en_US.utf8
...
ja_JP.utf8

but probably not. To add locale, say German, in ubuntu,

$ sudo locale-gen de_DE.utf8
$ locale -a
...
de_DE.utf8
$ LC_TIME=de_DE.utf8 python3 locale_error.py
('de_DE', 'UTF-8')
Do, 07 Dez 2023 14:27:45
...
ValueError: time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

If you get error with this python script, same error should be raised with the kaggle command:

$ LC_TIME=de_DE.utf8 kaggle competitions download -c blood-vessel-segmentation
time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

This is certainly OS dependent. I confirmed with ubuntu 22.04 LTS and 18.04 LTS, but macOS Ventrua works in a different way; no error with macOS.

Fix

One way to fix is to change locale every time you call strptime

saved = locale.setlocale(locale.LC_ALL)
locale.setlocale(locale.LC_ALL, 'C')
t = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %Z')
locale.setlocale(locale.LC_ALL, saved)

An alternative can be a conversion of the string to ISO 8601 format manually and then use datetime.fromisoformat().

I don't know why no one has reported since July, but I hope this will help many Kagglers. I am using ubuntu in English and didn't expect LC_TIME to be Japanese; this must be happening in many countries. Thanks.

Jun

ritikgupta65 commented 5 months ago

kindly assign this issue to me @junkoda

yashu1wwww commented 3 months ago

https://medium.com/@Yashu_Krish11/first-create-a-kaggle-datasets-folder-in-google-drive-4342c679b46f check it...