Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.01k stars 1.06k forks source link

Accessing to the public score of python scripts in Kaggle #203

Open mhnamaki opened 4 years ago

mhnamaki commented 4 years ago

Hey folks, Thanks for providing this nice API. Using it, I have written the following code to download python kernels from various Kaggle competitions. The API also lets us download some metadata related to the kernel. However, I don't see the public score as a part of metadata. My requirement is that to see what was a python script score in the leaderboard also, and by score, I mean the evaluation metric that has been used in the competition such as accuracy, etc.

It doesn't matter what are the competitions or how old they are. I just need data science python scripts with an evaluations metric associated to it (accuracy/precision/F-1, …).

`import time import kaggle

max_number_of_competitions = 100 default_page_size = 20 for category in kaggle.api.valid_competition_categories:

if category == 'all':
    continue

print('category: ' + str(category))

max_number_of_pages = int(max_number_of_competitions / default_page_size)
for page_index in range(max_number_of_pages):
    competitions = kaggle.api.competitions_list(category=category, page=page_index)
    time.sleep(1)

    print(str(len(competitions)) + ' competitions found in page ' + str(page_index))

    for competition in competitions:

        kernels = kaggle.api.kernels_list(competition=competition.ref,
                                          language='python',
                                          kernel_type='script',
                                          sort_by='scoreDescending',
                                          page_size=100)
        time.sleep(1)

        print(str(len(kernels)) + ' python script kernels found in ' + competition.ref)

        for kernel in kernels:
            kaggle.api.kernels_pull(kernel.ref, './' + category + "/" + competition.ref + '/' + kernel.ref + '/',
                                    metadata=True)
            time.sleep(1)`
blazespinnaker commented 2 years ago

Unfortunately that Kaggle folks wish to keep this as proprietary data I guess. I'm not sure why they don't explain why that is, it's not like we'd behoove them some attempt to profit off our labor. They do so much already.