Open tschaffter opened 1 year ago
Mapping exercise between Kaggle API ↔ OC schema and Meta Kaggle ↔ OC schema is complete:
Ideally two mappers will be needed, one for "active" challenges and one for "completed" challenges (in case we miss a challenge while pulling information with the Kaggle APIs).
Verena and I have met a couple of time to discuss how (Kaggle) challenges will be pulled and processed by OC. Since then I added a few use cases to the event model in lucidchart. A couple more meetings should help to identify the full mapping strategy required to cover most of the use cases.
Added to Sprint 23.03
Added to Backlog
Added to Backlog
What projects is this story for?
OpenChallenges
As a user, I want
As a data contributor, I want to represent Kaggle challenge with the OpenChallenges schema so that Kaggle challenges can be added to OpenChallenges DB.
Description
The Kaggle to Kafka service fetches challenges from Kaggle API. This service includes a file that shows examples of Kaggle challenges in JSON format. From what I remember from looking at the Kaggle API python client, it may be possible to specify the fields of interest: are there more fields than the one included by default? If yes, is a description of the full schema available somewhere?.
The goal of this story is to identify the specification of a "mapper" for converting original Kaggle challenges to OpenChallenges schema. The mapper will be later implemented in a microservice in Java.
The above description assumes working with challenges fetched from Kaggle API. Yet another source of information is the Kaggle archive of challenges updated daily. It is likely that we will fetch both this archive and and challenges from the API if they provide complementary information. Consider these two data sources when designing the mapper (or two mappers).
The Kaggle Competitions page does not include JSON-LD object, unlike its Dataset page, so that's not a source of challenge information.
Acceptance criteria
target_challenge.name = source_challenge.title
).Out of scope:
Tasks
No response
Anything else?
Relevant tickets:
1238
1246
Have you linked this story to a GitHub Project?