clarin-eric / LRSwitchboard

DEPRECATED - Please see https://github.com/clarin-eric/switchboard for latest version - Code Repository for the Language Resources Switchboard of CLARIN
Other
1 stars 0 forks source link

[Tool[ ACDH Stanbol: 302 http response #30

Closed dietervu closed 6 years ago

dietervu commented 6 years ago

Upon invoking the Stanbol enricher I get a 302 response page. This seems to break the automatic input of the file from the LRS.

wowasa commented 6 years ago

sorry - just received the forwarded notification - I will look for it

wowasa commented 6 years ago

fixed. The address of the Stanbol server had changed from http to https and the change wasn't reflected in the StanbolWrapper (address configurable via web.xml)

dietervu commented 6 years ago

Thanks. Just tried it again (on this file) and I get the following stacktrace:

HTTP Status [500] – [Internal Server Error]

Type Exception Report

Message Server returned HTTP response code: 500 for URL: https://enrich.acdh.oeaw.ac.at/enhancer/chain/geoNames_S_P_A

Description The server encountered an unexpected condition that prevented it from fulfilling the request.

Exception

java.io.IOException: Server returned HTTP response code: 500 for URL: https://enrich.acdh.oeaw.ac.at/enhancer/chain/geoNames_S_P_A
    sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
    sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
    sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
    at.ac.oeaw.acdh.StanbolWrapperServlet.doGet(StanbolWrapperServlet.java:86)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
    org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

Note The full stack trace of the root cause is available in the server logs.
Apache Tomcat/8.5.15
wowasa commented 6 years ago

sorry - took some time to reproduce the error since the link to switchboard doesn't work at all in my firefox. At seems that the error occurs due to the filesize. I will investigate whether it comes from the wrapper or from stanbol

wowasa commented 6 years ago

actually the maxEnhancementJobWaitTime of Stanbol was reached due to the filesize. We have to discuss first if we limit the filesize, chop it up or increase the maxEnhancementJobWaitTime

claus-zinn commented 6 years ago

Just tested it with the same file as Dieter: I get this JSON, indicated that the input size is limited.

{"_comment": "input restricted to 10240 bytes...","urn:enhancement-37310698-2e2b-d82e-63bf-8888a9ae506b":{"http:\/\/fise.iks-project.eu\/ontology\/extracted-from":[{"type":"uri","value":"urn:content-item-sha1-196ff65345f96d8b865e16f59389ab0d4828d056"}],"http:\/\/purl.org\/dc\/terms\/creator":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#string","type":"literal","value":"org.apache.stanbol.enhancer.engines.langdetect.LanguageDetectionEnhancementEngine"}],"http:\/\/purl.org\/dc\/terms\/created":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime","type":"literal","value":"2018-09-11T13:58:34.717Z"}],"http:\/\/purl.org\/dc\/terms\/type":[{"type":"uri","value":"http:\/\/purl.org\/dc\/terms\/LinguisticSystem"}],"http:\/\/purl.org\/dc\/terms\/language":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#string","type":"literal","value":"en"}],"http:\/\/fise.iks-project.eu\/ontology\/confidence":[{"datatype":"http:\/\/www.w3.org\/2001\/XMLSchema#double","type":"literal","value":"0.9999973988150396"}],"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[{"type":"uri","value":"http:\/\/fise.iks-project.eu\/ontology\/Enhancement"},{"type":"uri","value":"http:\/\/fise.iks-project.eu\/ontology\/TextAnnotation"}]}}

Can the limit be raised on your side? If not, we may need to complement a tool's description with a file size limit. If the actual size of a file is higher than the tool's advertised file size limit, the tool will become un-applicable. If possible, with a note to users that they would need to split-up the file in smaller junks before invoking the switchboard again with the file's parts.

wowasa commented 6 years ago

Hi Claus, I worked on it today an figured out, that the enhancement chain we are using currently can't even handle 100kb. Therefore I restricted the upload for the moment to 10kb after which the rest is cut off. The limit is hard coded so far. I have to discuss at first with the person who created the enhancement chains and with Matej whether we can use another one. After this I'm going to set the limit to 10mb.

wowasa commented 6 years ago

by the way: I set the maxEnhancementJobWaitTime of stanbol to 5 minutes first but this produced just another kind of error

wowasa commented 6 years ago

still working on it, since a file of 3.4 MB overextends most of our enhancement chains. A file size restriction of 10k is certainly too small but we have to introduce some size restriction anyway or we have to use a very simple chain which won't bring much benefit to the user

dietervu commented 6 years ago

Thanks for keeping us up to date. It is indeed an interesting question what the minimum supported file size for applications connected to the switchboard should be. Something like a megabyte would come to mind (according to wikipedia roughly the equivalent of a typical English book volume in plain text format (500 pages × 2000 characters per page).

Do you know where the bottleneck is situated?

wowasa commented 6 years ago

I'm not sure whether we can speak of a bottleneck since a serious enrichment takes time. The enhancement chain we used so for »contains administrative areas, cities, countries, universities, theatres, etc., this is the largest index we have«, (cit. Katalin). So we can either reduce the the quality or reduce the file size since I think increasing the timeout even more (currently 5 minutes) isn't really an option.

wowasa commented 6 years ago

to come to an result I suggest to restrict the upload to 1 MB (text is simply cut of but processed with a message in the response) and I will discuss with Katalin which of our enhancement chains precesses the input within a reasonable period of time (<5min). Any objection to this approach?

claus-zinn commented 6 years ago

This sounds like a reasonable approach as long as users are made aware of the fact that their input file has been truncated.