Closed dietervu closed 6 years ago
sorry - just received the forwarded notification - I will look for it
fixed. The address of the Stanbol server had changed from http to https and the change wasn't reflected in the StanbolWrapper (address configurable via web.xml)
Thanks. Just tried it again (on this file) and I get the following stacktrace:
HTTP Status [500] – [Internal Server Error]
Type Exception Report
Message Server returned HTTP response code: 500 for URL: https://enrich.acdh.oeaw.ac.at/enhancer/chain/geoNames_S_P_A
Description The server encountered an unexpected condition that prevented it from fulfilling the request.
Exception
java.io.IOException: Server returned HTTP response code: 500 for URL: https://enrich.acdh.oeaw.ac.at/enhancer/chain/geoNames_S_P_A
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at.ac.oeaw.acdh.StanbolWrapperServlet.doGet(StanbolWrapperServlet.java:86)
javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
Note The full stack trace of the root cause is available in the server logs.
Apache Tomcat/8.5.15
sorry - took some time to reproduce the error since the link to switchboard doesn't work at all in my firefox. At seems that the error occurs due to the filesize. I will investigate whether it comes from the wrapper or from stanbol
actually the maxEnhancementJobWaitTime of Stanbol was reached due to the filesize. We have to discuss first if we limit the filesize, chop it up or increase the maxEnhancementJobWaitTime
Just tested it with the same file as Dieter: I get this JSON, indicated that the input size is limited.
{"_comment": "input restricted to 10240 bytes...","urn:enhancement-37310698-2e2b-d82e-63bf-8888a9ae506b":{"http:\/\/fise.iks-project.eu\/ontology\/extracted-from":[{"type":"uri","value":"urn:content-item-sha1-196ff65345f96d8b865e16f59389ab0d4828d056"}],"http:\/\/purl.org\/dc\/terms\/creator":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#string","type":"literal","value":"org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine"}],"http:\/\/purl.org\/dc\/terms\/created":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime","type":"literal","value":"2018-09-11T13:58:34.717Z"}],"http:\/\/purl.org\/dc\/terms\/type":[{"type":"uri","value":"http:\/\/purl.org\/dc\/terms\/LinguisticSystem"}],"http:\/\/purl.org\/dc\/terms\/language":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#string","type":"literal","value":"en"}],"http:\/\/fise.iks-project.eu\/ontology\/confidence":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#double","type":"literal","value":"0.9999973988150396"}],"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[{"type":"uri","value":"http:\/\/fise.iks-project.eu\/ontology\/Enhancement"},{"type":"uri","value":"http:\/\/fise.iks-project.eu\/ontology\/TextAnnotation"}]}}
Can the limit be raised on your side? If not, we may need to complement a tool's description with a file size limit. If the actual size of a file is higher than the tool's advertised file size limit, the tool will become un-applicable. If possible, with a note to users that they would need to split-up the file in smaller junks before invoking the switchboard again with the file's parts.
Hi Claus, I worked on it today an figured out, that the enhancement chain we are using currently can't even handle 100kb. Therefore I restricted the upload for the moment to 10kb after which the rest is cut off. The limit is hard coded so far. I have to discuss at first with the person who created the enhancement chains and with Matej whether we can use another one. After this I'm going to set the limit to 10mb.
by the way: I set the maxEnhancementJobWaitTime of stanbol to 5 minutes first but this produced just another kind of error
still working on it, since a file of 3.4 MB overextends most of our enhancement chains. A file size restriction of 10k is certainly too small but we have to introduce some size restriction anyway or we have to use a very simple chain which won't bring much benefit to the user
Thanks for keeping us up to date. It is indeed an interesting question what the minimum supported file size for applications connected to the switchboard should be. Something like a megabyte would come to mind (according to wikipedia roughly the equivalent of a typical English book volume in plain text format (500 pages × 2000 characters per page).
Do you know where the bottleneck is situated?
I'm not sure whether we can speak of a bottleneck since a serious enrichment takes time. The enhancement chain we used so for »contains administrative areas, cities, countries, universities, theatres, etc., this is the largest index we have«, (cit. Katalin). So we can either reduce the the quality or reduce the file size since I think increasing the timeout even more (currently 5 minutes) isn't really an option.
to come to an result I suggest to restrict the upload to 1 MB (text is simply cut of but processed with a message in the response) and I will discuss with Katalin which of our enhancement chains precesses the input within a reasonable period of time (<5min). Any objection to this approach?
This sounds like a reasonable approach as long as users are made aware of the fact that their input file has been truncated.
Upon invoking the Stanbol enricher I get a 302 response page. This seems to break the automatic input of the file from the LRS.