athensresearch / athens

Athens is no longer maintainted. Athens was an open-source, collaborative knowledge graph, backed by YC W21
https://athensresearch.github.io/athens
Other
6.31k stars 397 forks source link

' vs ’ (same meaning, but differently recognised) #679

Closed ddauber closed 6 months ago

ddauber commented 3 years ago

Problem

Athens treats the following as separate phrases (as it probably should, given that it differs in one character):

The problem emerges if one copies text as a direct quote from other sources, even though the same meaning is implied, it would not show as a suggested link. The reason for that seems to be that ' is treated differently from . Thus, it does lead to duplication of pages or even not recognising that there are two versions of the same text.

I am aware this is not a major bug, but certainly an annoyance. It might be worth to just treat both as the same character input. The work-around would be to not use it all.

Screenshots/Demo

Athens Version 1.0.0-BETA.40

agentydragon commented 3 years ago

Unfortunately there's a lot of cases like this (where 2 characters are almost or entirely indistinguishable to the human eye but they're technically different Unicode characters), I think making an exception just for "fancy apostrophe" / "straight-down apostrophe" wouldn't make sense.

However I think it would be nice in the future to have some sort of "autolinking" (think https://www.dbpedia-spotlight.org/), where e.g. Athens would suggest to you "hey, you said John asked me yesterday to buy groceries, did you mean to link John -> John Doe, yesterday -> 2021-04-27, buy groceries -> Buy groceries?". For that, it would definitely make sense to do some normalization / loose parsing.