Closed lbhm closed 3 months ago
Hi @lbhm - it's true that there are limits on how far back you can search via the API. We may look into revising those in the future, but there's no definitive timeline at this point. One workaround could be using the Meta Kaggle dataset. Admittedly, this would only allow you to query titles, so it's a bit less than you'd get from kaggle.com, but perhaps it's sufficient for your purpose.
Thank you for the quick response @jplotts.
Unfortunately, just having titles/IDs is not quite enough for us. In this research project, we are looking into dataset search techniques that go beyond basic keyword search, so I am especially interested in column statistics (e.g., ranges, number of unique values, null values, etc.) to explore new search techniques.
Please allow me two follow-up questions:
Hi @lbhm -
Alright, I'll download the datasets with an appropriate rate and recreate the column statistics myself then.
Thank you for your help!
In the context of a research project, I would like to gather metadata about all datasets that match certain keywords (e.g.,
age
). Conceptually, this should be no problem with the Kaggle API by iterating through the result pages.However, I noticed that there seems to be some kind of shadow limit on the number of result pages for a search query. For example, the query
age
[1] has about 16K results according to the web UI. Nevertheless, every result page beyond 500 is empty.Thus, my question: Is there any way to acquire dataset metadata for queries with more than 10000 results?
[1] https://www.kaggle.com/datasets?search=age