GoTeamEpsilon / ctakes-rest-service

A JSON-based rest service to process unstructured clinical text through a smart natural language processing system.
Other
51 stars 33 forks source link

Use the current solutions as inspiration to finalize CtakesConceptMentionParser.java #3

Closed MatthewVita closed 6 years ago

MatthewVita commented 7 years ago

Here's the solution Daniel and I put together a while back (of course we need to get this to be Java):

https://raw.githubusercontent.com/GoTeamEpsilon/ctakes-rest-service/cfc324b59152c8601c26a7a30290406e5df3d2b6/parser.py

BEFORE: https://raw.githubusercontent.com/GoTeamEpsilon/ctakes-rest-service/cfc324b59152c8601c26a7a30290406e5df3d2b6/samples/data.xml

AFTER: https://raw.githubusercontent.com/GoTeamEpsilon/ctakes-rest-service/cfc324b59152c8601c26a7a30290406e5df3d2b6/samples/data.json

gandhirajan commented 7 years ago

The JSON response which we generated just have mention names and identified mentions under each category. Also it removes duplicate mentions indentified. Should we keep both the JSONs and user decide what they want ot should be converge?

MatthewVita commented 6 years ago

@gandhirajan Hello. Actually, https://github.com/GoTeamEpsilon/ctakes-friendly-web-ui depends on the JSON in the format I have listed. However, I would love to remove the duplicate mentions. That is a bug with my JSON.

I think having the context such as line numbers and codes is super important.

gandhirajan commented 6 years ago

Have a plan in mind. Need to see how it materialize.

gandhirajan commented 6 years ago

Merged both the JSON formats as discussed. Final JSON format as follows:

{ "AnatomicalSiteMention": { "EYE": [ "start: 90", "end: 93", "[codingScheme: SNOMEDCT_US, code: 371398005, cui: C0015392, tui: T023]", "[codingScheme: SNOMEDCT_US, code: 81745001, cui: C0015392, tui: T023]" ] }, "DrugChangeStatusAnnotation": {}, "StrengthAnnotation": {}, "FractionStrengthAnnotation": {}, "FrequencyUnitAnnotation": {}, "DiseaseDisorderMention": { "CANCER": [ "start: 115", "end: 121", "[codingScheme: MDR, code: 10049516, cui: C0006826, tui: T191]", "[codingScheme: MDR, code: 10026655, cui: C0006826, tui: T191]", "[codingScheme: SNOMEDCT_US, code: 363346000, cui: C0006826, tui: T191]", "[codingScheme: MDR, code: 10007050, cui: C0006826, tui: T191]", "[codingScheme: MDR, code: 10073835, cui: C0006826, tui: T191]", "[codingScheme: MDR, code: 10028997, cui: C0006826, tui: T191]" ], "METASTATIC COLORECTAL CANCER": [ "start: 12", "end: 40", "[codingScheme: MDR, code: 10052362, cui: C0948380, tui: T191]", "[codingScheme: MDR, code: 10052358, cui: C0948380, tui: T191]" ] }, "SignSymptomMention": { "NAUSEA": [ "start: 45", "end: 51", "[codingScheme: SNOMEDCT_US, code: 422587007, cui: C0027497, tui: T184]", "[codingScheme: MDR, code: 10028813, cui: C0027497, tui: T184]", "[codingScheme: MDR, code: 10028822, cui: C0027497, tui: T184]", "[codingScheme: MDR, code: 10037730, cui: C0027497, tui: T184]", "[codingScheme: MDR, code: 10016361, cui: C0027497, tui: T184]", "[codingScheme: MDR, code: 10028823, cui: C0027497, tui: T184]" ], "RED EYE": [ "start: 86", "end: 93", "[codingScheme: MDR, code: 10015962, cui: C0235267, tui: T184]", "[codingScheme: SNOMEDCT_US, code: 75705005, cui: C0235267, tui: T184]", "[codingScheme: MDR, code: 10038189, cui: C0235267, tui: T184]", "[codingScheme: MDR, code: 10038205, cui: C0235267, tui: T184]", "[codingScheme: MDR, code: 10015963, cui: C0235267, tui: T184]", "[codingScheme: SNOMEDCT_US, code: 703630003, cui: C0235267, tui: T184]", "[codingScheme: MDR, code: 10016009, cui: C0235267, tui: T184]" ] }, "RouteAnnotation": {}, "DateAnnotation": {}, "MeasurementAnnotation": {}, "ProcedureMention": {}, "TimeMention": {}, "StrengthUnitAnnotation": {} }