Incorrect results when comparing strings containing non ASCII characters

vukm commented 10 years ago

The tool gives incorrect results when strings containing non ASCII characters are compared. For example:

configuration
instance in the source graph
instance in the target graph

Those two instances should have been in the accepted-short.ttl instead of the review-short.ttl. Since this is not an issue when the strings only contain ASCII characters, I presume that this happened because the strings contained character 'ž' which is UTF-8 character.

ngonga commented 10 years ago

What happens if you set language tags in the target dataset?

The tool gives incorrect results when strings containing non ASCII characters are compared. For example:

1.
configuration
screenshot-mapping
<https://cloud.githubusercontent.com/assets/631599/2576966/205cea22-b978-11e3-9013-b2dba62017c7.png>
2.
instance in the source graph
screenshot-source-instance
<https://cloud.githubusercontent.com/assets/631599/2576969/2502766e-b978-11e3-993a-41e42f1869ab.png>
3.
instance in the target graph
screenshot-target-instance
<https://cloud.githubusercontent.com/assets/631599/2576970/296a63a6-b978-11e3-8f60-0078047a8a5c.png>
Those two instances should have been in the accepted-short.ttl instead of the review-short.ttl. Since this is not an issue when the strings only contain ASCII characters, I presume that this happened because the strings contained character 'ž' which is UTF-8 character.

— Reply to this email directly or view it on GitHub https://github.com/AKSW/LIMES/issues/3.

Axel Ngonga, Dr. rer. nat Head of AKSW Augustusplatz 10 Room P905 04109 Leipzig http://aksw.org/AxelNgonga

Tel: +49 (0)341 9732341 Fax: +49 (0)341 9732239

vukm commented 10 years ago

Still the same.

On Tue, Apr 1, 2014 at 11:05 AM, Axel Ngonga notifications@github.comwrote:

What happens if you set language tags in the target dataset?

The tool gives incorrect results when strings containing non ASCII characters are compared. For example:

1.

configuration screenshot-mapping < https://cloud.githubusercontent.com/assets/631599/2576966/205cea22-b978-11e3-9013-b2dba62017c7.png

2.

instance in the source graph screenshot-source-instance < https://cloud.githubusercontent.com/assets/631599/2576969/2502766e-b978-11e3-993a-41e42f1869ab.png

3.

instance in the target graph screenshot-target-instance < https://cloud.githubusercontent.com/assets/631599/2576970/296a63a6-b978-11e3-8f60-0078047a8a5c.png

Those two instances should have been in the accepted-short.ttl instead of the review-short.ttl. Since this is not an issue when the strings only contain ASCII characters, I presume that this happened because the strings contained character 'ž' which is UTF-8 character.

— Reply to this email directly or view it on GitHub https://github.com/AKSW/LIMES/issues/3.

Axel Ngonga, Dr. rer. nat Head of AKSW Augustusplatz 10 Room P905 04109 Leipzig http://aksw.org/AxelNgonga

Tel: +49 (0)341 9732341 Fax: +49 (0)341 9732239

— Reply to this email directly or view it on GitHubhttps://github.com/AKSW/LIMES/issues/3#issuecomment-39184294 .

ngonga commented 10 years ago

Can you send me your data?

dice-group / LIMES-legacy

Incorrect results when comparing strings containing non ASCII characters #3