data2health / nlp-sandbox

Cloud-based sandbox for text analytics
MIT License
3 stars 1 forks source link

Create a DB+API stack that exposes the i2b2 dataset #50

Closed tschaffter closed 4 years ago

tschaffter commented 4 years ago

This stack will be used to feed the i2b2 data to the NLP methods submitted to the NLP Sandbox.

Workflow:

  1. An agent deploys an NLP service to evaluate it
  2. The agent contact the dataset API to get the i2b2 data
  3. The agent sends the i2b2 data to the NLP service
  4. The NLP service process the data and sends back a response for each request received
  5. The agent compare the ground truth (found in the DB) and the content of the NLP service responses to evaluate the performance of the NLP service
  6. The agent push the performance metrics to the NLP Sandbox
  7. The agent shuts down the NLP service
  8. The agent wait for new submissions to evaluate

I've created a starter kit that's available here: https://github.com/data2health/2014-i2b2-deid-db

gkowalski commented 4 years ago

Loading of data was checked in yesterday.

tschaffter commented 4 years ago

Update

We have an early version of a data node that exposes 2014 i2b2 data (clinical notes and annotations).

tschaffter commented 4 years ago

Done! We largest part of the work is done as we can use the API server + DB to query clinical notes and date annotations. There's still work to improve the performance and security of the service, as well as support other types of annotations. This work is tracked efficiently in the GH repo of the data node.

https://github.com/data2health/nlp-sandbox-data-node-i2b2-2014