apache / ctakes

Apache cTAKES is a Natural Language Processing (NLP) platform for clinical text.
https://ctakes.apache.org
Apache License 2.0
46 stars 10 forks source link

Java Data Loader (jdl) fails in cTAKES-6.0.0-SNAPSHOT #24

Closed silwalr closed 5 days ago

silwalr commented 5 days ago

Hi all,

I am running into the following issue when I attempt to load tables into MSSQL Server using Java Data Loader (jdl) utility in cTAKES-6.0.0-SNAPSHOT. The jdl utility worked fine in cTAKES-5.1.0. The jdl is part of ctakes-ytex package.

It looks like the loader if failing to read the contents of the .txt file:
[java] 12 Sep 2024 09:16:31 INFO CsvLoader - insert into null () values ()

jdbc.sqlcmd is working fine.

ERROR Trace:

uima.create:

jdbc.sqlcmd:
     [echo] db.schema dbo
     [echo] umls.schema UMLS
     [echo] umls.catalog YTEX_TEST
     [echo] execute ./mssql/uima/create_reference.sql
     [copy] Copying 1 file to C:\Users\USERID\AppData\Local\Temp\4
     [echo] executing C:\Users\USERID\AppData\Local\Temp\4\ytex424002541.sql
      [sql] Executing resource: C:\Users\USERID\AppData\Local\Temp\4\ytex424002541.sql
      [sql] 9 of 9 SQL statements executed successfully
   [delete] Deleting: C:\Users\USERID\AppData\Local\Temp\4\ytex424002541.sql
   [delete] Deleting: C:\Users\USERID\AppData\Local\Temp\4\ytex1717560066use.sql

...

uima.create.trigger:

init:
     [echo] java.io.tmpdir C:\Users\USERID\AppData\Local\Temp\4\
     [echo] basedir E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data
     [echo] umls.data E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data/umls

init.conn:
     [copy] Copying 1 file to C:\Users\USERID\AppData\Local\Temp\4
uima.ref.load:

jdl:
     [echo] jdl.format ref_uima_type.xml
     [echo] jdl.data ref_uima_type.txt
     [echo] db.schema dbo
     [echo] umls.schema UMLS
     [echo] umls.prefix YTEX_TEST.UMLS.
     [copy] Copying 1 file to C:\Users\USERID\AppData\Local\Temp\4
     [java] 12 Sep 2024 09:16:31  INFO CsvLoader - delimiter 44 encapsulator 65534
     [java] 12 Sep 2024 09:16:31  INFO CsvLoader - insert into null () values ()
     [java] Exception in thread "main" java.lang.NullPointerException: Cannot invoke "java.lang.Number.intValue()" because "ncommit" is null
     [java]     at org.apache.ctakes.jdl.data.loader.CsvLoader.dataInsert(CsvLoader.java:225)
     [java]     at org.apache.ctakes.jdl.AppJdl.execute(AppJdl.java:87)
     [java]     at org.apache.ctakes.jdl.AppMain.main(AppMain.java:84)

BUILD FAILED
E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data\build.xml:573: The following error occurred while executing this line:
E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data\build.xml:589: The following error occurred while executing this line:
E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data\build.xml:535: Java returned: 1

I am encountering this error when I run the following command: E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data>ant -Dconfig.local=..\..\target\classes uima.all > uima_all.out 2>&1

I am using the following versions of JAVA and ant:

E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data>java --version
openjdk 17.0.10 2024-01-16
OpenJDK Runtime Environment Temurin-17.0.10+7 (build 17.0.10+7)
OpenJDK 64-Bit Server VM Temurin-17.0.10+7 (build 17.0.10+7, mixed mode, sharing)

E:\cTAKES\cTAKES-source\ctakes-ytex\scripts\data>ant -version
Apache Ant(TM) version 1.10.14 compiled on August 16 2023

I am using the following version of Microsoft JDBC Driver (https://learn.microsoft.com/en-us/sql/connect/jdbc/release-notes-for-the-jdbc-driver?view=sql-server-ver16#previous-releases). Here's the content of pom.xml file in ctakes-ytex package:

        <dependency>
            <groupId>com.microsoft.sqlserver</groupId>
            <artifactId>mssql-jdbc</artifactId>
            <version>12.6.2.jre8</version>
        </dependency>

Thanks for any help you can provide!

seanfinan commented 5 days ago

Hi silwalr, I will try to look into this if I can, but this is fairly far outside my range of experience. Out of curiosity, what are you using this for? There are plans to remove large parts of ytex and migrate others. ytex was dumped into ctakes about 12 years ago and the authors are (to my knowlege) no longer active. While we try to keep up with problems, the code base is so old that doing so is difficult and takes a significant amount of time.

silwalr commented 5 days ago

Hi @seanfinan, We are working on upgrading our system that runs cTAKES-4.0 to the most recent version of cTAKES at UVA health System. We use the jdl utilities to create/drop UMLS, SNOMEDCT, uima etc tables in MSSQL server. Our cTAKES-4.0 uses Semantic Similarity and Word Sense Disambiguation components of YTEX. Are there plans to drop any of these components in the future? Thanks!

seanfinan commented 5 days ago

It looks like the problem might just be a drop of feature in one of the updated 3rd party libraries. It may be a bit of a cheap escape, but I am tempted to release as-is with a "known issue" disclaimer and advise people to use one of the many import guis and scripts that can be found on the web. There are a lot of "new and shiny" tools out there that make the ytex utility look as old as it is. What are your thoughts? As for the semantic similarity and wsd components, I think that you are not alone in wanting those to be kept/migrated, and that is indeed part of the plan for the future.

silwalr commented 5 days ago

Releasing it as-is with a "known-issue" sounds like a reasonable thing to do, considering that there are other tools out there that can easily do this job. Good to know that semantic similarity and wsd components will be retained in the future by popular demand! Thanks for your timely response.

seanfinan commented 5 days ago

Excellent, Thanks!