freme-project / e-Entity

Apache License 2.0
1 stars 1 forks source link

FREME NER calls training #68

Closed x-fran closed 8 years ago

x-fran commented 8 years ago
curl -X POST --header "Content-Type:" -d @req_data.txt  "http://api.freme-project.eu/0.4/e-entity/freme-ner/documents?informat=text&outformat=json-ld&language=fr&dataset=dbpedia&mode=spot,link,classify" > curl_results.txt

This curl command is 100% valid?

x-fran commented 8 years ago

We can setup a separate call with all developers from all BC's for this because I have a few more questions and somebody else may be interested in this also.

What do you think @m1ci ?

jnehring commented 8 years ago

This curl command is 100% valid?

You specified an empty content type header: I dont think this causes a problem but it makes your http reqeust invalid. I suggest you remove --header "Content-Type:" because you override the content-type with parameter informat anyways.

Unfortunately I cannot reproduce your request because you did not share the request body with us. Do you have problems with the curl command?

We can setup a separate call with all developers from all BC's for this because I have a few more questions

You should share your questions with us before we agree on a call.

x-fran commented 8 years ago

Ok. What is "input" parameter for exactly? Is used in what situations?

jnehring commented 8 years ago

You can use the input parameter to override the post body of a request. You can use it when you want to do a GET request to a FREME e-Service, e.g. through typing a request in the web browsers address bar.

You call FREME programmatically so you do not need to use the input parameter. You should submit all data via post body.

x-fran commented 8 years ago

What is the max length of the string I can send in "input"?

jnehring commented 8 years ago

I dont know. Maybe 1000 chars?

Why do you need to know this? Just dont use input.

x-fran commented 8 years ago

5066 max. I just want to confirm this number. I just checked. After that number I get 414 or 400 or no content.

If I do this in my terminal

curl -X POST -d @req_data.txt  ...

I'm sending the file or the content only of the file?

jnehring commented 8 years ago

I'm sending the file or the content only of the file?

The content of the file.

Do you know that you can use the API documentation to try out API calls? There you dont have to deal with curl so it is way easier. The documentation generates a curl for you so when you have a question about a request you can just copy curl from there and paste it in your GitHub issue.

x-fran commented 8 years ago

Do you know that you can use the API documentation to try out API calls? There you dont have to deal with curl so it is way easier. The documentation generates a curl for you so when you have a question about a request you can just copy curl from there and paste it in your GitHub issue.

I use the API documentation a lot believe me but I have no control there so I prefer to use my own code.

For the raw plain text not readed from a file what do you recommend? Send the content in the POST or in "input" parameter?

jnehring commented 8 years ago

For the raw plain text not readed from a file what do you recommend? Send the content in the POST or in "input" parameter?

The only reason I see to use the input parameter is to create an API request via the web browsers address bar. In this use case you cannot add a POST body to the request.

In all other cases you should use the POST body to submit data to FREME. Its just easier because in that way you dont have to consider the content length.

x-fran commented 8 years ago

Any idea why (I've tried this in python) if I send plain raw text in POST body what NER sees is

Le+jour+de+l%E2%80%99an+approche+%C3%A0+grands+pas%2C+mais+o%C3%B9+c%C3%A9l%C3%A9brer+le+passage+%C3%A0+la+nouvelle+ann%C3%A9e%C2%A0%3F+Si+vous+avez+envie+de+changement+partez+f%C3%AAter+2016%C2%A0%C3%A0+l%E2%80%99%C3%A9tranger%2C+nous+avons+compil%C3%A9+pour+vous+nos+9+destinations+pr%C3%A9f%C3%A9r%C3%A9es+o%C3%B9+f%C3%AAter+le+jour+de+l%E2%80%99an+%C3%A0+travers+le+monde.+L%E2%80%99%C3%AEle+de+Kiribati+dans+le+Pacifique+est+officiellement+le+premier+endroit+dans+le+monde+%C3%A0+passer+%C3%A0+la+nouvelle+ann%C3%A9e%2C+tandis+que+Midway+Island+dans+les+%C3%AEles+Samoa+sera+le+dernier+endroit+%C3%A0+passer+%C3%A0+2016%2C+22+heures+plus+tard.1.+Sydney%2C+AustralieApr%C3%A8s+Wellington+en+Nouvelle-Z%C3%A9lande%2C+Sydney+est+la+seconde+grande+ville+%C3%A0+c%C3%A9l%C3%A9brer+le+passage+%C3%A0+la+nouvelle+ann%C3%A9e.+Chaque+ann%C3%A9e%2C+des+millions+de+personnes+regardent+les+incroyables+feux+d%E2%80%99artifices+de+la+ville+%C3%A0+la+t%C3%A9l%C3%A9vision%2C+sans+aucun+doute+parmi+les+plus+beaux+organis%C3%A9s+sur+la+plan%C3%A8te+ce+jour-

and if I send the same text from the same file but a "file-like object" Freme NER sees the body POST like this:

Le jour de l’an approche à grands pas, mais où célébrer le passage à la nouvelle année ? Si vous avez envie de changement partez fêter 2016 à l’étranger, nous avons compilé pour vous nos 9 destinations préférées où fêter le jour de l’an à travers le monde. L’île de Kiribati dans le Pacifique est officiellement le premier endroit dans le monde à passer à la nouvelle année, tandis que Midway Island dans les îles Samoa sera le dernier endroit à passer à 2016, 22 heures plus tard.1. Sydney, AustralieAprès Wellington en Nouvelle-Zélande, Sydney est la seconde grande ville à célébrer le passage à la nouvelle année. Chaque année, des millions de personnes regardent les incroyables feux d’artifices de la ville à la télévision, sans aucun doute parmi les plus beaux organisés sur la planète ce jour-là. Et le spectacle pyrotechnique  ne sonne en rien la fin des festivités : il est possible de faire la fête jusqu’au petit matin dans les nombreux bars et clubs de la ville !2. Barcelone, EspagneA

?

jnehring commented 8 years ago

Checkout this curl:

curl -X POST --header "Content-Type: text/plain" --header "Accept: text/n3" -d "Le jour de l’an approche à grands pas, mais où célébrer le passage à la nouvelle année ? Si vous avez envie de changement partez fêter 2016 à l’étranger, nous avons compilé pour vous nos 9 destinations préférées où fêter le jour de l’an à travers le monde. L’île de Kiribati dans le Pacifique est officiellement le premier endroit dans le monde à passer à la nouvelle année, tandis que Midway Island dans les îles Samoa sera le dernier endroit à passer à 2016, 22 heures plus tard.1. Sydney, AustralieAprès Wellington en Nouvelle-Zélande, Sydney est la seconde grande ville à célébrer le passage à la nouvelle année. Chaque année, des millions de personnes regardent les incroyables feux d’artifices de la ville à la télévision, sans aucun doute parmi les plus beaux organisés sur la planète ce jour-là. Et le spectacle pyrotechnique  ne sonne en rien la fin des festivités : il est possible de faire la fête jusqu’au petit matin dans les nombreux bars et clubs de la ville !2. Barcelone, EspagneA
" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?informat=text&outformat=turtle&language=en&dataset=dbpedia&mode=all"

It sends your text in the post body and it produces the correct output. So the error must be in your python script. I cannot debug your python scripts.

It appears to me that you url-encode your text before you put it in the post body. You should put it "raw" in the post body and not url encoded.

x-fran commented 8 years ago

I've tried this in python I'm not url encode the POST body

This is the code and even if is python is straightforward:

headers_freme = {
    'Content-Type': '',
    'Accept': 'text/n3'
}
params_freme = {
    'informat': 'text',
    'outformat': 'json-ld',
    'language': str.lower(l),
    'dataset': 'dbpedia',
    'mode': 'spot,classify'
}
# data={'body': text} POST body
r = requests.post(URL_FREME, headers=headers_freme, params=params_freme, data={'body': text})

Maybe what I did here is wrong. Any ideas?

If I use the same code but instead changing the code like this "data=text" where "text" is a file object not raw text everything is working as expected.

x-fran commented 8 years ago

From the curl manual:

-d, --data (HTTP) Sends the specified data in a POST request to the HTTP server, in the same way that a browser does when a user has filled in an HTML form and presses the sub‐ mit button. This will cause curl to pass the data to the server using the content- type application/x-www-form-urlencoded.

How do you handle the content-type? In your side?

This will cause curl to pass the data to the server using the content- type application/x-www-form-urlencoded.

I suppose if you don't specify the content-type by default is sending application/x-www-form-urlencoded

Am I wrong?

Just a side note Good to keep in mind the following:

When --data is told to read from a file like that, carriage returns and newlines will be stripped out.

Again curl manual

If we try to reproduce a error using curl, even if I have a filter who is stripping out newlines I don't have 100% control over the content I get from our users.

jnehring commented 8 years ago

You can overwrite the Content-Type with parameter informat. Actually when parameter informat is set then we do not take care for the Content-Type header, just process everything that is in the body. So I suggest you just do not use Content-Type header and only the informat parameter.

About stripping newlines: I think for the sake of reproducing an error with curl is ok to have only 99% of control. You can get 100% of control through your python codes.

x-fran commented 8 years ago

What is the benefit of overwriting the Content-Type? Why not use Content-Type only or at least if is set to have preference over the "informat". I'm missing something here?

jnehring commented 8 years ago

The part from the manual you quoted said that the -d parameter sets the Content-Type header to application/x-www-form-urlencoded so I thought it might be a good idea to override it with informat. But actually I never had problems with curl, the -d parameter and the Content-Type parameter.

So you can use Content-Type header also. No problem.

x-fran commented 8 years ago

Why overwriting the Content-Type? Is that helpful in any way? Why not remove "informat" and not overwriting anything?

So you can use Content-Type header also. No problem.

So, I can still use Content-Type header. No problem. You will overwrite it anyway. Doesn't make any sense to me. You lost me on this. Or not. Who knows? Let's blame English language for this as 2nd language in your case and 3rd language in my case.

I will ask in stackoverflow to see others opinion on this soon as I find some time. I still have the feeling that this is not the right way to do things. Overwriting the Content-Type header I mean. Has to be a reason for this. Could you share with me the reason so I can stop be such annoying? :)

I'm authorized to use the repository github link in case is necessary?

x-fran commented 8 years ago

Better idea. Just send me the github link to that piece of code where "informat" overwrites the Content-Type header. I will try to understand the logic/algorithm by myself and I will close the issue. Is leading us anywhere.

I still need to know if I'm authorized to use Freme's github repository links in case is necessary to show the code to others developers.