aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
https://aphp.github.io/edsnlp/
BSD 3-Clause "New" or "Revised" License
105 stars 29 forks source link

Correct eds.history pipeline to distinguish "medical history" from "history of current disease" #219

Open marieverdoux opened 9 months ago

marieverdoux commented 9 months ago

As built, if the use_section=True config is applied to the eds.history pipeline, all "antécédents", "antécédents familiaux" and "histoire de la maladie" sections are used to tag entities as "history".

The problem is that :

I suggest removing "histoire de la maladie" and "antécédents familiaux" from section_history list in edsnlp/pipelines/qualifiers/history/patterns.py

If an entity refers to the history of the current disease, this will be found with the section title.

Thank you !

Aremaki commented 9 months ago

Hello Marie,

Thank you for your feedbacks. The pipe's name is ambiguous... The idea of this pipe was to detect if the event (such as disease) occurs before the present time of the document.

Could you develop on your purposes when using the history pipe ?

marieverdoux commented 9 months ago

Thanks Adam.

Usually in medical records, "antécédents" refer to previous diseases that are no longer of interest for the current visit, and "histoire de la maladie" details the history of the disease the visit is about. They are distinct categories of the medical record.

I think it can be useful in many contexts to differentiate both cases, for instance, if you want to know if the disease is still active.

In my curent application, I want to sort out documents related to current active disease, so I use the eds.history pipeline to filter out documents that would relate to old diseases. If I use the pipeline with use_section config, I end up filtering out all the entities under "histoire de la maladie", eventhough I want to keep them.