apache / ctakes

Apache cTAKES is a Natural Language Processing (NLP) platform for clinical text.
https://ctakes.apache.org
Apache License 2.0
50 stars 12 forks source link

cTAKES custom dictionary setup documentation #19

Closed andytubeee closed 5 months ago

andytubeee commented 5 months ago

I have a CSV file of dictionary vocabulary and concept code. Are there any documentations on how we can set up a custom dictionary for when using bin/runClinicalPipeline.sh?

Thanks

seanfinan commented 5 months ago

There is/was documentation somewhere, but it is spread out and would take me some time to find the latest and most accurate. However, you have have a look at the example here https://github.com/apache/ctakes/blob/main/ctakes-dictionary-lookup-fast/src/user/resources/org/apache/ctakes/dictionary/lookup/fast/bsv/tinyDict.bsv Column 1 has the cui, 2 has the tui, 3 the synonym. If there is a 4th entry then it is used as the preferred text for the concept. The lookupXml for the example is here: https://github.com/apache/ctakes/blob/main/ctakes-dictionary-lookup-fast/src/user/resources/org/apache/ctakes/dictionary/lookup/fast/bsv/tinyDictSpec.xml You just want to point to that lookupXml file. on the command line "-l bsv/tinyDictSpec.xml". You may need to use the full path. This all uses the code here: https://github.com/apache/ctakes/blob/main/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/dictionary/BsvRareWordDictionary.java and here: https://github.com/apache/ctakes/blob/main/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/concept/BsvConceptFactory.java If you aren't certain about the tui you can use T000 for Unknown

seanfinan commented 5 months ago

https://github.com/apache/ctakes/wiki/ctakes-dictionary-lookup-fast If you expand DictionarySubPipe https://github.com/apache/ctakes/wiki/ctakes-dictionary-lookup-fast#dictionary-sub-pipe you can see where the cli parameter for "LookupXml" is set to lower-case L If you get this working and feel comfortable writing a couple of lines on a custom BSV dictionary for the wiki, please put it in a comment here. Thanks

andytubeee commented 5 months ago

Thanks for sharing, I converted csv to bsv, then created a xml based on your tinyDictSpec.xml, and then used the -l flag with ./bin/runClinicalPipeline.

I didn't run into any issue with the pipeline, thus I am assuming it worked.

I think you can put the example links in the current wiki because it was really straightforward to build yourself.

  1. Convert your dictionary in this format https://github.com/apache/ctakes/blob/main/ctakes-dictionary-lookup-fast/src/user/resources/org/apache/ctakes/dictionary/lookup/fast/bsv/tinyDict.bsv
  2. Pass the .bsv path in this xml format. https://github.com/apache/ctakes/blob/main/ctakes-dictionary-lookup-fast/src/user/resources/org/apache/ctakes/dictionary/lookup/fast/bsv/tinyDictSpec.xml
  3. add -l <pathToXML> with ./runClinicalPipeline.
seanfinan commented 5 months ago

Hi Andrew,

Thank you for the update!

Sean


From: Andrew Yang @.> Sent: Sunday, June 2, 2024 12:40 PM To: apache/ctakes @.> Cc: Finan, Sean @.>; Comment @.> Subject: Re: [apache/ctakes] cTAKES custom dictionary setup documentation (Issue #19) [EXTERNAL]

Thanks for sharing, I converted csv to bsv, then created a xml based on your tinyDictSpec.xml, and then used the -l flag with ./bin/runClinicalPipeline.

I didn't run into any issue with the pipeline, thus I am assuming it worked.

I think you can put the example links in the current wiki because it was really straightforward to build yourself.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/apache/ctakes/issues/19*issuecomment-2143935250__;Iw!!NZvER7FxgEiBAiR_!toPRCuz-LiN_P7hWwLHbUYIWk_apk5ZysElqkEpgwBlQNnJ_k4ABH5hriopjhwbUj-8kGZR7Idw-SS4xVApkz9pHC3vKJuk$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB4J2NA3EL3IAXCSU7EXC23ZFNDGHAVCNFSM6AAAAABITH6F52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBTHEZTKMRVGA__;!!NZvER7FxgEiBAiR_!toPRCuz-LiN_P7hWwLHbUYIWk_apk5ZysElqkEpgwBlQNnJ_k4ABH5hriopjhwbUj-8kGZR7Idw-SS4xVApkz9pHc7XaMds$. You are receiving this because you commented.Message ID: @.***>