Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.16k stars 1.09k forks source link

Python-API based download and functionality. #384

Open janaSunrise opened 3 years ago

janaSunrise commented 3 years ago

Kaggle API provides a neat CLI to interact with the API. But, with my experience from working with Jupyter notebooks, and ML projects, I feel, If there was a way to instead use the Python API to download datasets easily, would have been of much help!

To explain, Let's say I want to download a Kaggle dataset. A function could be in the Python API which simplifies the downloading as follows, kaggle.download("<the-kaggle-dataset-url>")

And this downloads the Dataset / Competition files, Auto unzips them and Prepares them for you. This could reduce work by a lot.

Now, For the Kaggle API authentication, We could have 3 steps for checking,

The Python API could make it even more friendlier, approachable and better, As it reduces work with downloading and also performing other functions. I would like to work on this functionality initially, and extend it to further things (Converting commands into their Python API equivalent).

This would benefit a lot of people, and reduce their dependence on 3rd party packages which do this instead, and stick to this lightweight python package which does the work easy for them.

AdityaSoni19031997 commented 2 years ago

I believe that functionality should be there implemented already.

janaSunrise commented 2 years ago

I have gone through the API, and I checked that It's a low-level API. It's not such that the User can directly get his/her hands on and use it. I want to help with API that works directly through code instead of invoking CLI. It can make the experience better.

AdityaSoni19031997 commented 2 years ago

Then, we can submit a PR with the same!

janaSunrise commented 2 years ago

That works! Let's do it 😁

yasirroni commented 2 years ago

This is the farthest I go:

import json
import os
import kaggle

with open('/content/drive/MyDrive/kaggle/kaggle.json') as f:
    kaggle_auth = json.load(f)
    print(kaggle_auth)

os.environ['KAGGLE_USERNAME'] = kaggle_auth['username']  #manually input My_Kaggle User_Name 
os.environ['KAGGLE_KEY'] = kaggle_auth['key']  #manually input My_Kaggle Key

kaggle.api.authenticate()
kaggle.api.competitions_data_download_files(...)

Sadly, it crash jupyter and i also don't know how to specify download location in competitions_data_download_files

janaSunrise commented 2 years ago

This is the problem I want to solve. Make them a one-liner. kaggle.download or the respective function for each of the feature. Dataset, Competition, Files and such.

yasirroni commented 2 years ago

Will waiting your PR. :)

AdityaSoni19031997 commented 2 years ago

I will explore the API and send something hopefully!

If you are reading the secrets, then you shouldn't be setting it manually. You can do something like either get it from the JSON or from ENV or you need to raise an error as well because you cannot proceed.

janaSunrise commented 2 years ago

For loading .env variables, I would either use dotenv, or decouple, probably? I wanted to take the Kaggle KEY as direct input using getpass when running any command to download/work with API.

ArazPishdad commented 2 years ago

I have tried this code: import kaggle kaggle.download("<the-kaggle-train.csv-https://www.kaggle.com/c/g-research-crypto-forecasting/data>") After running the code I get an error. There are also files downloaded with are not openable!

Any advise?

janaSunrise commented 2 years ago

This feature isn't implemented yet. To implement I have to isolate and namespace the codebase, clean up CLI and make a submodule kaggle.py to have these features, would be the cleanest way. You've got to wait until it's implemented because this is just a ideation currently, and has not been implemented/worked on yet.

What I think I would work on is probably splitting the code into API code, CLI, and Usable API for developers, and Maybe integrate Flake8 linting too for aligning the code with linting rules and PEPs as intended.

yasirroni commented 2 years ago

@janaSunrise could you share your implementation? You can make a derivative repo that supercharged using kaggle API or make a PR here so that others like me can use your implementation.

janaSunrise commented 2 years ago

I'm going to have to work on it, and would need sometime. Currently, I'm bit packed with everything and exams, so probably would start in a while as I get free.