freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

Input validation fails #47

Closed jnehring closed 8 years ago

jnehring commented 8 years ago

This cURL

curl -v -d @test.txt "http://api.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&input=&outformat=json-ld&language=en&dataset=dbpedia"

with this test.txt produces "Internal Server Exception". The problem is that the submitted input is empty because of &input=&outformat=....

This should produce a "Bad Request" and proper error message.

m1ci commented 8 years ago

shouldn't this exception be thrown while validating the params using NIFParameterSet?

jnehring commented 8 years ago

should this exception be thrown while validating the params using NIFParameterSet?

True. I will change the code.

x-fran commented 8 years ago

Still no entities spotted removing input=& with the same text.

curl -v -d @test.txt "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&outformat=json-ld&language=en&dataset=dbpedia&enrichement=dbpedia-categories"
x-fran commented 8 years ago

I'm assuming that this a valid cUrl call

curl -v -d @test.txt "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&outformat=json-ld&language=en&dataset=dbpedia&enrichement=dbpedia-categories"

Running the test against the test.txt file I get no entities.

Now, if I delete the first paragraph from the text

High quality global journalism requires investment. Please share this article with others using the link below, do not cut & paste the article. See our Ts&Cs and Copyright Policy for more detail. Email ftsales.support@ft.com to buy additional rights. http://www.ft.com/cms/s/0/7786936c-73fc-11e5-bdb1-e6e4767162cc.html#ixzz3p0A9NKmA (From the beginning of the text file)

I have 1 entity "taIdentRef" : "dbpedia:EMC_E2"

{
    "@id" : "http://freme-project.eu/#char=109,111",
    "@type" : [ "nif:Phrase", "nif:String", "nif:Word", "nif:RFC5147String" ],
    "nif:anchorOf" : "E2",
    "beginIndex" : "109",
    "endIndex" : "111",
    "referenceContext" : "http://freme-project.eu/#char=0,3115",
    "taClassRef" : "http://www.w3.org/2002/07/owl#Thing",
    "itsrdf:taConfidence" : 0.5673365574838136,
    "taIdentRef" : "dbpedia:EMC_E2"
  }

and the following categories

"@graph" : [ {
    "@id" : "dbc:A1A-A1A_locomotives",
    "info" : "13.340149038027347",
    "label" : {
      "@language" : "en",
      "@value" : "A1A-A1A locomotives"
    }
  }, {
    "@id" : "dbc:Diesel_locomotives_of_the_United_States",
    "info" : "11.12083600962287",
    "label" : {
      "@language" : "en",
      "@value" : "Diesel locomotives of the United States"
    }
  }, {
    "@id" : "dbc:Electro-Motive_Diesel_locomotives",
    "info" : "11.67831777392192",
    "label" : {
      "@language" : "en",
      "@value" : "Electro-Motive Diesel locomotives"
    }
  }, {
    "@id" : "dbc:Locomotives_with_cabless_variants",
    "info" : "13.882676272428105",
    "label" : {
      "@language" : "en",
      "@value" : "Locomotives with cabless variants"
    }
  }, {
    "@id" : "dbc:Passenger_locomotives",
    "info" : "11.92242245122086",
    "label" : {
      "@language" : "en",
      "@value" : "Passenger locomotives"
    }
  }, {
    "@id" : "dbc:Railway_locomotives_introduced_in_1937",
    "info" : "14.318775387234778",
    "label" : {
      "@language" : "en",
      "@value" : "Railway locomotives introduced in 1937"
    }
  }, {
    "@id" : "dbc:Scrapped_locomotives",
    "info" : "10.98838571359922",
    "label" : {
      "@language" : "en",
      "@value" : "Scrapped locomotives"
    }
  }, {
    "@id" : "dbc:Standard_gauge_railway_locomotives",
    "info" : "8.839232858238084",
    "label" : {
      "@language" : "en",
      "@value" : "Standard gauge railway locomotives"
    }
  }, {
    "@id" : "dbc:Union_Pacific_Railroad_locomotives",
    "info" : "14.158310715041532",
    "label" : {
      "@language" : "en",
      "@value" : "Union Pacific Railroad locomotives"
    }
  }, {
    "@id" : "dbpedia:EMC_E2",
    "subject" : [ "dbc:Scrapped_locomotives", "dbc:Union_Pacific_Railroad_locomotives", "dbc:Passenger_locomotives", "dbc:Railway_locomotives_introduced_in_1937", "dbc:Standard_gauge_railway_locomotives", "dbc:Diesel_locomotives_of_the_United_States", "dbc:Electro-Motive_Diesel_locomotives", "dbc:A1A-A1A_locomotives", "dbc:Locomotives_with_cabless_variants" ]
  },

The funny part is that "EMC_E2" spotted entity does not show up in the content and all categories are related to "locomotives".

This is the link from where I get the content. http://www.ft.com/intl/cms/s/0/7786936c-73fc-11e5-bdb1-e6e4767162cc.html#axzz3p07brijW

Hope this helps somehow.

jnehring commented 8 years ago

Can you please move the discussion about getting no entities to a new GitHub issue? We should not mix up different topics in the same discussion. This issue is about a request that produces the answer "BadRequest".

jnehring commented 8 years ago

The problem does not occur anymore. Above API request works as expected.