fusepoolP3 / p3-datatxt-stanbol

A Stanbol Enhancement Engine using dataTXT
Apache License 2.0
0 stars 0 forks source link

Suported Languages #2

Open westei opened 9 years ago

westei commented 9 years ago

When sending a text with an unsupported Language the engine currently fails with a

Caused by: eu.spaziodati.datatxt.stanbol.enhancer.engines.client.DatatxtException: Unmanaged language'es'
    at eu.spaziodati.datatxt.stanbol.enhancer.engines.client.DatatxtClient.performRequest(DatatxtClient.java:186)

This is not what a Stanbol Enhancement Engine is supposed to do. Instead an engine should check for supported languages within the canEnhance(..) method and refuse to enhance content with unsupported language. But what it MUST NOT do is to fail in the computeEnhancement(..) method because it accepted a request for an unsupported language

Their are the following possibilities to solve this:

  1. having a Service where dataTXT returns supported languages. Use this list to implement canEnhance(..) so that contents with unsupported languages are refused
  2. Same as (1) but with a hard coded list of supported languages. Based on the documentation at [1] dataTXT supports de | en | fr | it | pt
  3. checking the error code of the response. If the error is for a unsupported language throw a special exception (e.g. a UnmanagedLanguageException). This exception can than be cached by the Engine and be silently ignored. In this case canEnhance(..) would accept content in any language, but the engine would not "crash" the enhancement chain in case of an unsupported one.

As I do not see a service that can be used for (1) and I do not want to require code changes for the engine if dataTXT adds additional lanugages i will go for option (3) to fix this issue.

However I would strongly prefer a solution that can already decline requests in the canEnhance(..) as this would avoid calls to the dataTXT service.

[1] https://dandelion.eu/docs/api/datatxt/nex/v1/

gmega commented 9 years ago

Hi Rupert,

I am in favour of (2) and, eventually, of obtaining the list of supported languages from dataTXT as an asynchronous job fired on engine startup. Feel free to add (3) but keep the issue open and I'll fix it as soon as I have time.

Thanks, Giuliano

On Wed, Jan 28, 2015 at 3:18 PM, Rupert Westenthaler < notifications@github.com> wrote:

When sending a text with an unsupported Language the engine currently fails with a

Caused by: eu.spaziodati.datatxt.stanbol.enhancer.engines.client.DatatxtException: Unmanaged language'es' at eu.spaziodati.datatxt.stanbol.enhancer.engines.client.DatatxtClient.performRequest(DatatxtClient.java:186)

This is not what a Stanbol Enhancement Engine is supposed to do. Instead an engine should check for supported languages within the canEnhance(..) method and refuse to enhance content with unsupported language. But what it MUST NOT do is to fail in the computeEnhancement(..) method because it accepted a request for an unsupported language

Their are the following possibilities to solve this:

  1. having a Service where dataTXT returns supported languages. Use this list to implement canEnhance(..) so that contents with unsupported languages are refused
  2. Same as (1) but with a hard coded list of supported languages. Based on the documentation at [1] dataTXT supports de | en | fr | it | pt
  3. checking the error code of the response. If the error is for a unsupported language throw a special exception (e.g. a UnmanagedLanguageException). This exception can than be cached by the Engine and be silently ignored. In this case canEnhance(..) would accept content in any language, but the engine would not "crash" the enhancement chain in case of an unsupported one.

As I do not see a service that can be used for (1) and I do not want to require code changes for the engine if dataTXT adds additional lanugages i will go for option (3) to fix this issue.

However I would strongly prefer a solution that can already decline requests in the canEnhance(..) as this would avoid calls to the dataTXT service.

[1] https://dandelion.eu/docs/api/datatxt/nex/v1/

— Reply to this email directly or view it on GitHub https://github.com/fusepoolP3/p3-datatxt-stanbol/issues/2.

westei commented 9 years ago

I provided a fix using solution #3. Even if (1) or (2) gets implemented one can keep (3) to cover cases such as a language becomes temporarily unavailable (not knowing if such a thing could happen).