[Story] Store Kaggle challenges into a DB

tschaffter commented 1 year ago

What projects is this story for?

OpenChallenges

As a user, I want

As an OpenChallenges admin, I want to review the information of new Kaggle challenges so that they can later be shown to users.

Description

We have a prototype service that pulls Kaggle challenges from Kaggle API and push them to a Kafka topic. The goal of this story is to create another service that consume the Kafka topic and write challenges to a database as they arrive.

Acceptance criteria

The service listens to the Kafka topic and print incoming challenges to stdout.
The service implements a retry strategy to connect to the Kafka topic.
The service has its own database.
The service creates the required table and/or empty them at startup (useful during development).
The service only has to handle a minimal schema for Kaggle challenges.
- More properties will be supported as part of future stories.

Tasks

[ ] Create the initial service for the Kaggle store
[ ] Connect the Kaggle store to the Kaggle Kafka topic
[ ] Update the Kaggle store to write incoming challenges to its DB

Anything else?

No response

Have you linked this story to a GitHub Project?

[X] I have linked this story to a GitHub Project and set its metadata.

tschaffter commented 1 year ago

One option may be to store the original Kaggle challenges in a document DB such as MongoDB. This would provide us a greater flexibility in terms of how we want to use Kaggle data. For example, we may currently ignore a field from the original Kaggle challenge objects but we may find a use for it in the future.

tschaffter commented 1 year ago

Moved to Backlog

tschaffter commented 12 months ago

Added to Sprint 23.10.

tschaffter commented 12 months ago

Elasticsearch enables to store unstructured document so we could store raw Kaggle challenges to ES, then process them to be added to OC DB.

tschaffter commented 11 months ago

Added to Backlog

Sage-Bionetworks / sage-monorepo