Open janaSunrise opened 3 years ago
I believe that functionality should be there implemented already.
I have gone through the API, and I checked that It's a low-level API. It's not such that the User can directly get his/her hands on and use it. I want to help with API that works directly through code instead of invoking CLI. It can make the experience better.
Then, we can submit a PR with the same!
That works! Let's do it 😁
This is the farthest I go:
import json
import os
import kaggle
with open('/content/drive/MyDrive/kaggle/kaggle.json') as f:
kaggle_auth = json.load(f)
print(kaggle_auth)
os.environ['KAGGLE_USERNAME'] = kaggle_auth['username'] #manually input My_Kaggle User_Name
os.environ['KAGGLE_KEY'] = kaggle_auth['key'] #manually input My_Kaggle Key
kaggle.api.authenticate()
kaggle.api.competitions_data_download_files(...)
Sadly, it crash jupyter and i also don't know how to specify download location in competitions_data_download_files
This is the problem I want to solve. Make them a one-liner. kaggle.download
or the respective function for each of the feature. Dataset, Competition, Files and such.
Will waiting your PR. :)
I will explore the API and send something hopefully!
If you are reading the secrets, then you shouldn't be setting it manually. You can do something like either get it from the JSON or from ENV or you need to raise an error as well because you cannot proceed.
For loading .env
variables, I would either use dotenv
, or decouple
, probably? I wanted to take the Kaggle KEY as direct input using getpass
when running any command to download/work with API.
I have tried this code: import kaggle kaggle.download("<the-kaggle-train.csv-https://www.kaggle.com/c/g-research-crypto-forecasting/data>") After running the code I get an error. There are also files downloaded with are not openable!
Any advise?
This feature isn't implemented yet. To implement I have to isolate and namespace the codebase, clean up CLI and make a submodule kaggle.py
to have these features, would be the cleanest way. You've got to wait until it's implemented because this is just a ideation currently, and has not been implemented/worked on yet.
What I think I would work on is probably splitting the code into API code, CLI, and Usable API for developers, and Maybe integrate Flake8 linting too for aligning the code with linting rules and PEPs as intended.
@janaSunrise could you share your implementation? You can make a derivative repo that supercharged using kaggle API or make a PR here so that others like me can use your implementation.
I'm going to have to work on it, and would need sometime. Currently, I'm bit packed with everything and exams, so probably would start in a while as I get free.
Kaggle API provides a neat CLI to interact with the API. But, with my experience from working with Jupyter notebooks, and ML projects, I feel, If there was a way to instead use the Python API to download datasets easily, would have been of much help!
To explain, Let's say I want to download a Kaggle dataset. A function could be in the Python API which simplifies the downloading as follows,
kaggle.download("<the-kaggle-dataset-url>")
And this downloads the Dataset / Competition files, Auto unzips them and Prepares them for you. This could reduce work by a lot.
Now, For the Kaggle API authentication, We could have 3 steps for checking,
~/kaggle.json
exists, If yes, Read the credentials.getpass.getpass()
The Python API could make it even more friendlier, approachable and better, As it reduces work with downloading and also performing other functions. I would like to work on this functionality initially, and extend it to further things (Converting commands into their Python API equivalent).
This would benefit a lot of people, and reduce their dependence on 3rd party packages which do this instead, and stick to this lightweight python package which does the work easy for them.