ErwinKomen / FoliaEntity

2 stars 1 forks source link

Communication with spotlight server fails. #7

Closed proycon closed 6 years ago

proycon commented 6 years ago

I have a problem with entities not being linked. First of all, the default spotlight service seems out of service, so is no longer an option:

Method: use SPOTLIGHT
Starting: _vad003178301_01_0306.tok.frogmodernized.folia.xml
Error in [entity/MakeXmlPostRequest]: The request timed out
Stack:   at System.Net.HttpWebRequest.GetRequestStream () <0x41819480 + 0x00167> in <filename unknown>:0 
  at FoliaEntity.entity.MakeXmlPostRequest (System.String sUrlStart, System.String sMethod, System.String sData) <0x4180b810 + 0x00972> in <filename unknown>:0

I therefore set up my own local one (version 1.0.0) using https://github.com/dbpedia-spotlight/spotlight-docker . (Temporarily running on scootaloo:2232).

I have a test document here: scootaloo:/scratch/proycon/_vad003178301_01_0289.tok.frogmodernized.folia.xml, which I run on scootaloo as follows:

$ lmdev
(lamachine16.dev)$ $VIRTUAL_ENV/foliaentity/FoliaEntity.exe -w -a "foliaentity" -m s -u s http://127.0.0.1:2232/rest -i _vad003178301_01_0289.tok.frogmodernized.folia.xml -o out/

This results in errors like:

Stack:   at System.Net.HttpWebRequest.EndGetResponse (IAsyncResult asyncResult) <0x41d8f0b0 + 0x0019f> in <filename unknown>:0 
  at System.Net.HttpWebRequest.GetResponse () <0x41d8dc00 + 0x00053> in <filename unknown>:0 
  at FoliaEntity.entity.MakeHtmlPostRequest (System.String sUrlStart, System.String sMethod, System.String sData, System.String sEntity) <0x41d94cc0 + 0x002bb> in <filename unknown>:0 

Error in [entity/MakeHtmlPostRequest]: The remote server returned an error: (404) Not Found.
Stack:   at System.Net.HttpWebRequest.EndGetResponse (IAsyncResult asyncResult) <0x41d8f0b0 + 0x0019f> in <filename unknown>:0 
  at System.Net.HttpWebRequest.GetResponse () <0x41d8dc00 + 0x00053> in <filename unknown>:0 
  at FoliaEntity.entity.MakeHtmlPostRequest (System.String sUrlStart, System.String sMethod, System.String sData, System.String sEntity) <0x41d94cc0 + 0x002bb> in <filename unknown>:0 

Log file:

Text:   _vad003178301_01_0289.tok.frogmodernized.folia.xml      Hits:   0       Fail:   0
Texts:  1       Hits:   0       Fail:   0

The process does write a FoLiA file, but without any links, and exits with status code 0 as if everything went well (which is a bug I reckon?).

Something could have changed in the spotlight API between version 0.7 and 1.0? Or something may be wrong in the server itself, although I don't really get any weird error output there...

ErwinKomen commented 6 years ago

Thanks for passing on the extensive error message. I'm going to have a look at this right after lunch.

ErwinKomen commented 6 years ago

Did you try approaching scootaloo:2232 through the shell? I get a 404 (not found) message when I try to:

curl -v -D- -X POST -H "Content-Type: application/x-www-form-urlencoded" -H "Accept: text/html" http://127.0.0.1:2232 * Rebuilt URL to: http://127.0.0.1:2232/ * Trying 127.0.0.1... * Connected to 127.0.0.1 (127.0.0.1) port 2232 (#0) POST / HTTP/1.1 Host: 127.0.0.1:2232 User-Agent: curl/7.47.0 Content-Type: application/x-www-form-urlencoded Accept: text/html

< HTTP/1.1 404 Resource Not Found HTTP/1.1 404 Resource Not Found < Content-Type: text/html;charset=ISO-8859-1 Content-Type: text/html;charset=ISO-8859-1 < Transfer-Encoding: chunked Transfer-Encoding: chunked < Date: Mon, 19 Feb 2018 15:15:29 GMT Date: Mon, 19 Feb 2018 15:15:29 GMT < * Connection #0 to host 127.0.0.1 left intact

(I can have a further look at this on Wednesday)

proycon commented 6 years ago

I believe the API endpoint is served at at http://scootaloo:2232/rest rather than the root.

ErwinKomen commented 6 years ago

Not sure what you mean. I get a 404 on http://127.0.0.1:2232/rest as well as on http://127.0.0.1:2232/rest.

proycon commented 6 years ago

Indeed, I only get 404 as well. I don't really know what kind of query the server should accept though..

Perhaps the server is not performing as intended...

proycon commented 6 years ago

Ah, I manage to get at least some response (albeit HTTP 400) when I do something like:

curl -i -X POST \                                                                                                                                                                                                
    -H "Accept:application/json" \
    -H "content-type:application/x-www-form-urlencoded" \
    -d "disambiguator=Document&confidence=-1&support=-1&text=In%20Amsterdam%20is%20het%20lekker%20weer" \
       http://127.0.0.1:2232/rest/annotate

Although that fails eventually too, it gives me some server log activity. FoliaEntity didn't get that far.

13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - ******************************** Parameters ********************************
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - API: /annotate
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - client ip: 127.0.0.1
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - text: In Amsterdam is het lekker weer
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - text length in chars: 31
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - confidence: -1.0
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - support: -1
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - types: 
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - sparqlQuery: 
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - policy: false
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - coreferenceResolution: true
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - spotter: Default
13373762 [Grizzly-2232(4)] INFO org.dbpedia.spotlight.web.rest.SpotlightInterface - disambiguator: Document
proycon commented 6 years ago

Ah! I had a hunch which proved correct, I was missing a trailing / in specifying the -u parameter!

proycon commented 6 years ago

(That means it now seems to work again) :)

ErwinKomen commented 6 years ago

The right way to call Spotlight from the command-line:

curl -i -X POST -H "Accept:text/xml" -H "content-type:application/x-www-form-urlencoded" -d "confidence=-1&text=In%20Amsterdam%20is%20het%20lekker" http://127.0.0.1:2232/rest/annotate/

This yields XML output. This is how FoliaEntity uses it. Let me see what happens on the test document...