Closed triducnghiem closed 5 years ago
The uimaFIT SimplePipeline doesn't support running on and logging errors. You could clone the SimplePipeline and just catch exceptions and process on. The code is - well - simple ;)
Thanks Richard, I found it easy too :-). By the way, I found the problem. It is because sometime, the Stanford tokenizer returns a token with a whitespace, in this case: "2 1/2" and it causes a problem to the WHITESPACE_PATTERN.split(blabla) in the Sentence.java (by matetools). Matetools doesn't provide tokenizer itself, and therefore, I used the StanfordTokenizer provided in StanfordSegmenter (by DKPRO), which by default, does not support setting up some parameter for the tokenizer, for example: normalizeSpace (As far as I saw in the source code).
That's right. These parameters are depending on which tokenizer is internally used and that depends on the document language / language parameter. The current code already contains a additionalOptions
field in the StanfordSegmenter class which seems to be a step towards allowing users to provider such parameters - however, it is presently unused... should be fixed...
I think this can be closed.
An exception occurs when I ran the MateSemanticRoleLabeler on some specific sentences, one of them can be seen as in the following test:
The error message is:
Besides, as far as I know, the pipeline will be terminated whenever exception occurs during processing a CAS document? Is there anyway to keep it running and logging the unprocessed CAS and error messages to somewhere (log file or ERRSTD)?