commul / ctap

The parent project of ctap-feature and ctap-web
5 stars 1 forks source link

Short text crashes analysis featureAE.LexicalSophisticationAE #1

Open iiegn opened 1 year ago

iiegn commented 1 year ago

A (very) short text:

Ho l’adrenalina al massimo. Tremo ma non  e  per il freddo. Tiro. Fuori.

crashes the analysis (even when re-starting the analysis run, that is the analysis will never end).

2023-03-24 16:02:40,497 [[Procesing Pipeline#1 Thread]::] TRACE com.ctapweb.feature.featureAE.LexicalSophisticationAE - Processing document with FEATUR
E_EXTRACTOR <Lexical Sophistication Feature Extractor>: Ho l’adrenalina al massimo. Tremo ma non e per il ...
 2023-03-24 16:02:40,497 [[Procesing Pipeline#1 Thread]::] TRACE com.ctapweb.feature.featureAE.LexicalSophisticationAE - Calculated total sophistication
 value 96.3989548166888 from scope AW on 13 words (word type? true).
 2023-03-24 16:02:40,497 [[Procesing Pipeline#1 Thread]::] INFO  com.ctapweb.feature.featureAE.LexicalSophisticationAE - Feature value 7.415304216668369
 calculated from AE 12 populated into CAS.
 2023-03-24 16:02:40,521 [http-nio-8080-exec-23] TRACE com.ctapweb.web.server.user.AnalysisGeneratorServiceImpl - Received request for service getAnalys
isStatus.
 2023-03-24 16:02:40,521 [http-nio-8080-exec-14] TRACE com.ctapweb.web.client.component.AnalysisGenerator - Requesting service getAnalysisStatus from wi
thin timer...
 2023-03-24 16:02:40,522 [http-nio-8080-exec-23] TRACE com.ctapweb.web.server.user.AnalysisGeneratorServiceImpl - Verifying user cookies for service get
AnalysisStatus...
 2023-03-24 16:02:40,522 [http-nio-8080-exec-23] TRACE com.ctapweb.web.server.user.UserServiceImpl - Received request for service verifyUserCookies. Ver
ifying user cookies...
 2023-03-24 16:02:40,522 [http-nio-8080-exec-23] TRACE com.ctapweb.web.server.user.UserServiceImpl - Found user cookies: id = 1; email = ctap@eurac.edu;
 sessionToken = $2a$10$RjM064NLZdqOr.JvnMWVa.HrtTn01vcwf5hxsZoG2VHAiL7Wb8pUq. Querying database to verify cookie information...
 2023-03-24 16:02:40,523 [http-nio-8080-exec-23] TRACE com.ctapweb.web.server.user.UserServiceImpl - User cookies verified. Returning to client...
 2023-03-24 16:02:40,523 [http-nio-8080-exec-23] TRACE com.ctapweb.web.server.user.UserServiceImpl - Completed request for service verifyUserCookies. Re
turning to client...
 2023-03-24 16:02:40,525 [http-nio-8080-exec-23] TRACE com.ctapweb.web.server.user.AnalysisGeneratorServiceImpl - Completed request for service getAnaly
sisStatus. Returning to client...
 2023-03-24 16:02:40,570 [http-nio-8080-exec-25] TRACE com.ctapweb.web.client.component.AnalysisGenerator - Service get AnalysisStatus from within timer
 returned successfully.
 2023-03-24 16:02:40,621 [[CasConsumer Pipeline Thread]::] ERROR com.ctapweb.web.server.analysis.DatabaseWriterCasConsumer - Throwing
 org.apache.uima.resource.ResourceProcessException: null
        at com.ctapweb.web.server.analysis.DatabaseWriterCasConsumer.processCas(DatabaseWriterCasConsumer.java:102) [classes/:?]
        at org.apache.uima.analysis_engine.impl.compatibility.CasConsumerAdapter.process(CasConsumerAdapter.java:99) [uimaj-core-2.8.1.jar:2.8.1]
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385) [uimaj-
core-2.8.1.jar:2.8.1]
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308) [uimaj-core
-2.8.1.jar:2.8.1]
        at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269) [uimaj-core-2.8.1.jar:2.8.1]
        at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:893) [uimaj-cpe-2.8.1.jar:2.8.1]
        at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:575) [uimaj-cpe-2.8.1.jar:2.8.1]
Caused by: org.postgresql.util.PSQLException: ERROR: cannot convert infinity to numeric

Presumably, the text so short that no proper value can be calculated (which then might lead to a valued of +/-Infinity) that, in turn, cannot be converted into a proper PSQL representation.

iiegn commented 1 year ago

Deleting the text from the corpus (or excluding it from the/a selection) works.

wc -w reports 13 for this text; for the next smallest text (which works) wc -w reports 24.