asyml / ForteHealth

The project is in the incubation stage and still under development. ForteHealth is a flexible and powerful ML workflow builder for biomedical and clinical scenarios. This is part of the CASL project: http://casl-project.ai/
Apache License 2.0
10 stars 5 forks source link

Create an example for MIMIC-III clinical note pipeline. #64

Open Leolty opened 2 years ago

Leolty commented 2 years ago

I had this idea because I wanted to have a pipeline that had the ability to cover all of our processors (in the NLP field) as much as possible. And I think the mimic-iii data satisfies that.

In this example, we should try to use all the processors we have, for example, if our sample data is selected from a patient's self-report or query or clinical diagnose records (maybe a covid-19 patient), which describes their physical condition, e.g., with A symptoms and without B symptoms (Negation Context Detect), and then give a diagnosis based on the symptom description (ICD Coding). The user description may have a more specific time, such as how it was last night, how it was last month, so that it can be extended to the Temporal domain. ( I know the temporal related processors may be not completed, we can just work on all the things we have currently).

But it may be hard to find a piece of data that covers all the processors, for this issue, maybe we can just concatenate them to achieve what we want.

Possible included components:

  1. Sentence Segmenter
  2. Tokenizer
  3. Bio NER Tagger
  4. Negation Context
  5. ICD Coding
  6. Temporal Mention Tagging
  7. Temporal Relation Extraction
  8. Deidentification

(Just ignore the processors we do not have currently)