fusepoolP3 / p3-dictionary-matcher-transformer

Dictionary Matcher is P3 transformer for SKOS based entity extraction.
Apache License 2.0
2 stars 3 forks source link

"ERROR: Taxonomy URI is invalid!" on (presumably) valid URI #8

Open retog opened 7 years ago

retog commented 7 years ago

The URI https://web.archive.org/web/20151025232812/http://data.nytimes.com/descriptors.rdf is accepted by browsers, however the dictionary matcher gives an error

$ curl -X POST -d "Frauds and Swindlings cause significant concerns with regards to Ethics." "http://sandbox.fusepool.info:8301/?taxonomy=https://web.archive.org/web/20151025232812/http://data.nytimes.com/descriptors.rdf"
ERROR: Taxonomy URI is invalid! ("https://web.archive.org/web/20151025232812/http://data.nytimes.com/descriptors.rdf")
ktk commented 7 years ago

I can reproduce this problem when a taxonomi URI is on localhost. Does not work on Windows and Mac that way. I wonder if in your case the bug is related to https.

Whatever is done in resolving the URI, I don't think that the code there works properly.

ktk commented 7 years ago

It is not a generic https issue, got it to work on some other file.

semanticfire commented 7 years ago

This is related to a deprecated URLValidator in https://github.com/fusepoolP3/p3-dictionary-matcher-transformer/blob/master/src/main/java/eu/fusepool/p3/transformer/dictionarymatcher/Utils.java

Should use 1.6 package of commons validator use package: org.apache.commons.validator.routines.UrlValidator add extra init parameter to allow localhost: ALLOW_LOCAL_URLS

ktk commented 7 years ago

@semanticfire did you patch that? If so, could you do a pull-request?

semanticfire commented 7 years ago

I've touched more in the meantime, lets see if I can make that work