Open jnehring opened 8 years ago
The error seems to originate from within internationalization.
Error log:
ERROR 2016-10-05 11:06:28,805 [http-nio-8089-exec-1] eu.freme.bservices.controllers.pipelines.PipelinesController - For input string: "//freme-project.eu/#offset_38_45"
java.lang.NumberFormatException: For input string: "//freme-project.eu/#offset_38_45"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.valueOf(Integer.java:766)
at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter$TextUnitResource.<init>(HTMLBackConverter.java:450)
at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.listTextUnitResources(HTMLBackConverter.java:359)
at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.convertBack(HTMLBackConverter.java:165)
at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.convertBack(HTMLBackConverter.java:115)
at eu.freme.bservices.internationalization.okapi.nif.converter.HTMLBackConverter.convertBack(HTMLBackConverter.java:82)
at eu.freme.bservices.internationalization.api.InternationalizationAPI.convertBack(InternationalizationAPI.java:131)
at eu.freme.bservices.controllers.pipelines.core.Conversion.convertBack(Conversion.java:62)
at eu.freme.bservices.controllers.pipelines.core.PipelineService.chain(PipelineService.java:115)
at eu.freme.bservices.controllers.pipelines.PipelinesController.pipeline(PipelinesController.java:92)
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvok
and
ERROR 2016-10-05 11:06:28,807 [http-nio-8089-exec-1] eu.freme.common.exception.ExceptionHandlerService - Request: http://rv1443.1blu.de:8089/pipelining/chain raised
eu.freme.common.exception.InternalServerErrorException: For input string: "//freme-project.eu/#offset_38_45"
at eu.freme.bservices.controllers.pipelines.PipelinesController.pipeline(PipelinesController.java:120)
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
Hi, I tested using these two requests in sequence:
curl -X POST --header "Content-Type: text/html" --header "Accept: text/html" --header "Cache-Control: no-cache" --data "@input.txt" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=spot%2Clink&nif-version=2.1" > output.txt
curl -X POST --header "Content-Type: text/html" --header "Accept: text/html" --header "Cache-Control: no-cache" --data "@output.txt" "http://api-dev.freme-project.eu/current/e-terminology/tilde?source-lang=en&target-lang=de&nif-version=2.1" > out-output.txt
where input.txt is a file whose content type is text/html and output.txt is a file with content-type text/html (the output of the first request) and it is sent as input to the second request. The files are attached below. I don't have the error. Can you try again now?
The error happens when executing the pipeline. I could not reproduce it using individual curl commands. The pipeline does not convert from html -> turtle -> html in every step. The pipeline converts from html -> turtle in the beginning, then it performs all pipeline steps with turtle and in the end it converts back to html. So the CURL commands are
curl -X POST --header "Content-Type: text/html" --header "Accept: text/turtle" --header "Cache-Control: no-cache" --data "@input.txt" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=spot%2Clink&nif-version=2.1" > output.txt
curl -X POST --header "Content-Type: text/turtle" --header "Accept: text/turtle" --header "Cache-Control: no-cache" --data "@output.txt" "http://api-dev.freme-project.eu/current/e-terminology/tilde?source-lang=en&target-lang=de&nif-version=2.1"
Executing the two API requests one after another, it works. So I think the problem happens when the output HTML is created because we cannot reproduce this behaviour using separate curl requests.
Basing on the Log it is java.lang.NumberFormatException that occurs in HTMLBackConverter.java so in the second step of the pipeline when calling terminology and in particular when getting the begin index of "//freme-project.eu/#offset_38_45". I think there's some problem with the nif-version. So even if it is nif-version = 2.1, the parameter it is not received correctly and it defaults to version 2.0. So when it parses a nif version 2.1 thinking it's nif 2.0 it fails with number format exception when trying to get the begin index because it uses the wrong identifier. Maybe it is possible to add some log to verify the nif version.
I debugged locally using this curl:
curl -X POST -H "Content-Type: application/json" -H "Cache-Control: no-cache" -H "Postman-Token: cc799c16-5d39-accf-b81d-1aa4a48fb5c9" --data "@json.txt" "http://localhost:8080/pipelining/chain"
and with the json.txt attached (see below the attachement) in which I use http://localhost:8080/e-terminology/tilde as the endpoint. I found that the nif version parameter that arrives to the InternationalizationAPI.java method Reader convertBack(InputStream markupsFile, InputStream enrichedFile, String nifVersion) is null. This creates the above described problem since a nif 2.1 is handled as it were a nif 2.0 (when no value is set for nif-version parameter, the version defaults to 2.0 ), so the string freme-project.eu/#offset_38_45 is not parsed correctly causing the errors.
Thanks for the investigation. This is a tough bug. The pipeline itself has no idea of the nif version. We can only guess the nif version by analyzing the nif content. I chose another solution. I scan all pipeline requests and if one of the requests contains a parameter "nif-version" then I submit this nif version to e-internationalization. This implementation does not fix the bug currently, I need to debug it once again. Will do it on monday.
But I do not like this solution. Guessing the nif version from the content might be better. @m1ci do you know of an implementation that guesses the nif version that we can reuse here?
@m1ci do you know of an implementation that guesses the nif version that we can reuse here?
In the RDF you can see
<http://freme-project.eu/#collection>
a nif:ContextCollection ;
nif:hasContext <http://freme-project.eu/#offset_0_33> ;
<http://purl.org/dc/terms/conformsTo>
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1> .
which says that the context http://freme-project.eu/#offset_0_33 conforms to NIF 2.1. This should help.
Also, we agreed that default version is 2.0, and by using the nif-version
parameter one can set the version to 2.0 or 2.1.
I added source code to guess the nif format. It checks if the version is annotated in the nif document. Further I added this code to the pipelines module. It still does not work although the error message has changed. A debug message says that it detects nif version 2.1 so it hands the right version internationalizationApi.convertBack(). The new version of pipelines is already merged in the master and installed on freme-dev. The problem can be reproduced with above curl request.
I think that the problem is now within e-internationalization.
The error stack trace is here: stacktrace.txt.
Hi Jan, debugging locally I found that the nifConvertedFile-skeleton it is being parsed to get the HTML file as a string has "#char=" and not #offset_ as expected. InternationalizationAPI method:convertToTurtleWithMarkups(InputStream is, String mimeType, String nifVersion) throws ConversionException has the parameter nifVersion null. So it is not set and the nif converted files produced are nif 2.0 version.
Thank you for investigating on this. I think we should create a parameter nif-version for pipelines so we do not guess the parameter but explicitly set it. Therefore I created #115.
I put the solution here and close #115
We need the nif-version parameter in pipelines as well. It determines the nif version that is submitted to e-Internationalization in the beginning and in the end of the pipeline. The nif-version parameter of individual pipeline requests is not influenced by the nif-version parameter of the pipeline. This will be a parameter similar to visibility or persist which gets his own field in the database. Currently it can values 2.0 and 2.1. This requires changes in
For now we will not fix the bug.
Ok.
@jnehring if this will be implemented and the pipeline model is changed, I think it would be really useful to put also useI18n
into the pipeline.
But to fix this bug in general, I don't think so many changes are needed, just three files of the Pipelines service need minor changes. The parameter nif-version
has to be added to the endpoints POST /pipelining/chain
and POST /pipelining/chain/{id}
, they just forward it to PipelineService.chain. In the roundtripping case, this method should default it to 2.0 if necessary, (then eventually put it into every single PipelineRequest and) the methods convertToNif
here and convertBack
here have to use it. convertToNif needs a minor modification, it just has to forward the parameter to convertToTurtleWithMarkups
and convertToTurtle
(by analogy to convertBack) which are called with null
at the moment.
So the same nif version is used for conversion and back conversion and no guessing is necessary. I dont know, if it should be possible or if it makes sense in any way to allow different nif versions within one single pipeline which does roundtripping, so I put it in brackets above.
Do I miss something?
This curl
fails with error message
It works when I remove nif-version=2.1 from both API calls.