Create a DB+API stack that exposes the i2b2 dataset

tschaffter commented 4 years ago

This stack will be used to feed the i2b2 data to the NLP methods submitted to the NLP Sandbox.

Workflow:

An agent deploys an NLP service to evaluate it
The agent contact the dataset API to get the i2b2 data
The agent sends the i2b2 data to the NLP service
The NLP service process the data and sends back a response for each request received
The agent compare the ground truth (found in the DB) and the content of the NLP service responses to evaluate the performance of the NLP service
The agent push the performance metrics to the NLP Sandbox
The agent shuts down the NLP service
The agent wait for new submissions to evaluate

I've created a starter kit that's available here: https://github.com/data2health/2014-i2b2-deid-db

gkowalski commented 4 years ago

Loading of data was checked in yesterday.

tschaffter commented 4 years ago

Update

We have an early version of a data node that exposes 2014 i2b2 data (clinical notes and annotations).

tschaffter commented 4 years ago

Done! We largest part of the work is done as we can use the API server + DB to query clinical notes and date annotations. There's still work to improve the performance and security of the service, as well as support other types of annotations. This work is tracked efficiently in the GH repo of the data node.

https://github.com/data2health/nlp-sandbox-data-node-i2b2-2014

data2health / nlp-sandbox

Create a DB+API stack that exposes the i2b2 dataset #50

Update