Sage-Bionetworks / sage-monorepo

Where OpenChallenges, Schematic, and other Sage open source apps are built
https://sage-bionetworks.github.io/sage-monorepo/
Apache License 2.0
23 stars 12 forks source link

[Story] Map Kaggle challenge schema to OpenChallenge challenge data schema #1251

Open tschaffter opened 1 year ago

tschaffter commented 1 year ago

What projects is this story for?

OpenChallenges

As a user, I want

As a data contributor, I want to represent Kaggle challenge with the OpenChallenges schema so that Kaggle challenges can be added to OpenChallenges DB.

Description

The Kaggle to Kafka service fetches challenges from Kaggle API. This service includes a file that shows examples of Kaggle challenges in JSON format. From what I remember from looking at the Kaggle API python client, it may be possible to specify the fields of interest: are there more fields than the one included by default? If yes, is a description of the full schema available somewhere?.

The goal of this story is to identify the specification of a "mapper" for converting original Kaggle challenges to OpenChallenges schema. The mapper will be later implemented in a microservice in Java.

The above description assumes working with challenges fetched from Kaggle API. Yet another source of information is the Kaggle archive of challenges updated daily. It is likely that we will fetch both this archive and and challenges from the API if they provide complementary information. Consider these two data sources when designing the mapper (or two mappers).

The Kaggle Competitions page does not include JSON-LD object, unlike its Dataset page, so that's not a source of challenge information.

Acceptance criteria

Out of scope:

Tasks

No response

Anything else?

Relevant tickets:

Have you linked this story to a GitHub Project?

vpchung commented 1 year ago

Mapping exercise between Kaggle API ↔ OC schema and Meta Kaggle ↔ OC schema is complete:

Ideally two mappers will be needed, one for "active" challenges and one for "completed" challenges (in case we miss a challenge while pulling information with the Kaggle APIs).

tschaffter commented 1 year ago

Verena and I have met a couple of time to discuss how (Kaggle) challenges will be pulled and processed by OC. Since then I added a few use cases to the event model in lucidchart. A couple more meetings should help to identify the full mapping strategy required to cover most of the use cases.

tschaffter commented 1 year ago

Added to Sprint 23.03

tschaffter commented 1 year ago

Added to Backlog

tschaffter commented 11 months ago

Added to Backlog