Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.29k stars 1.1k forks source link

Downloading individual files returns 404; Works only for all files that are 1.5+ months old #427

Open valentinwerner1 opened 2 years ago

valentinwerner1 commented 2 years ago

I have been trying to pull files on a daily basis and as of now the only way is to download the whole dataset every time (which is 8GB + 80MB per day). Downloading all files works; Downloading single files work up to 0420_**; Does not work for any newer file in data set Other datasets work (also for individual files) Error is occuring on Jupyter; in Python 3.8.10; and 3.9.7 (haven't tried other versions). API is on 1.5.12

Dataset: https://www.kaggle.com/datasets/bwandowando/ukraine-russian-crisis-twitter-dataset-1-2-m-rows

My code: def get_file(): name = "0602" + "_UkraineCombinedTweetsDeduped.csv.gzip" #this is the name of all files AFTER april 1st api.dataset_download_file(dataset="bwandowando/ukraine-russian-crisis-twitter-dataset-1-2-m-rows", file_name = name, path = "/tmp") import zipfile file_path = "/tmp/" + name #gets path to file zipfile.ZipFile(file_path+".zip", "r").extractall("/tmp") #extracts all to /tmp import pandas as pd df = pd.read_csv(file_path, compression = "gzip", index_col = 0) return df

Returns Error: ApiException: (404) Reason: Not Found HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Fri, 03 Jun 2022 18:34:11 GMT', 'Access-Control-Allow-Credentials': 'true', 'Set-Cookie': 'ka_sessionid=a052975ff852d6cae350fe39c876e4c4; max-age=2626560; path=/, GCLB=CJe5tvmFvN7yJg; path=/; HttpOnly', 'Transfer-Encoding': 'chunked', 'Vary': 'Accept-Encoding', 'Turbolinks-Location': 'https://www.kaggle.com/api/v1/datasets/download/bwandowando/ukraine-russian-crisis-twitter-dataset-1-2-m-rows/0602_UkraineCombinedTweetsDeduped.csv.gzip', 'X-Kaggle-MillisecondsElapsed': '286', 'X-Kaggle-RequestId': '38727177ddf887103520f21b2a5694eb', 'X-Kaggle-ApiVersion': '1.5.12', 'X-Frame-Options': 'SAMEORIGIN', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Content-Security-Policy': "object-src 'none'; script-src 'nonce-ZWatHkXczzcGCQuS1PVaJw==' 'report-sample' 'unsafe-inline' 'unsafe-eval' 'strict-dynamic' https: http:; frame-src 'self' https://www.kaggleusercontent.com https://www.youtube.com/embed/ https://polygraph-cool.github.io https://www.google.com/recaptcha/ https://form.jotform.com https://submit.jotform.us https://submit.jotformpro.com https://submit.jotform.com https://www.docdroid.com https://www.docdroid.net https://kaggle-static.storage.googleapis.com https://kaggle-static-staging.storage.googleapis.com https://kkb-dev.jupyter-proxy.kaggle.net https://kkb-staging.jupyter-proxy.kaggle.net https://kkb-production.jupyter-proxy.kaggle.net https://kkb-dev.firebaseapp.com https://kkb-staging.firebaseapp.com https://kkb-production.firebaseapp.com https://kaggle-metastore-test.firebaseapp.com https://kaggle-metastore.firebaseapp.com https://apis.google.com https://content-sheets.googleapis.com/ https://accounts.google.com/ https://storage.googleapis.com https://docs.google.com https://drive.google.com; base-uri 'none'; report-uri https://csp.withgoogle.com/csp/kaggle/20201130;", 'X-Content-Type-Options': 'nosniff', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'}) HTTP response body: b'{"code":404,"message":"Not found"}'

Any help is deeply appreciated :)

yonikremer commented 1 year ago

Did you manage to fix that?

valentinwerner1 commented 1 year ago

Nope, but also didnt try again. I downloaded every file manually instead.