Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
21 stars 12 forks source link

Draft a microservice that pulls data from Kaggle #1217

Closed tschaffter closed 1 year ago

tschaffter commented 1 year ago

The idea is to develop a microservice that searches Kaggle competitions using its REST API and make these data available on OpenChallenges.

Workflow:

A variant of the above workflow would be to have 1) microservice that queries Kaggle and push the competitions to Kafka and 2) develop a microservice that listens to Kafka for incoming Kaggle competitions and stores the data to OpenChallenges DB.

The benefit of this approach is that querying Kaggle and processing the competitions received is separated, which provides more flexibility. For example, we could have more than one microservice that listen to Kafka to perform different actions.

Kaggle API

We can get information about Kaggle REST API via its Python client.

tschaffter commented 1 year ago

@vpchung Kaggle API enables to filter competitions by "categories". Could categories be the same thing as the tags that you used in the notebook?