TiddlyWiki / TiddlyWiki5

A self-contained JavaScript wiki for the browser, Node.js, AWS Lambda etc.
https://tiddlywiki.com/
Other
8.07k stars 1.19k forks source link

Unicode CamelCase issue #3927

Open mdrda opened 5 years ago

mdrda commented 5 years ago

When using some unicode characters (letters with diacritics) such as "ž", "č", "ř", links which should be formed by CamelCase notation are not created and composed CamelCase words are considered plain text.

pmario commented 5 years ago

see: https://tiddlywiki.com/#How%20to%20turn%20off%20camel%20case%20linking It's possible to switch it off.

mdrda commented 5 years ago

Switching it off is not a solution. The issue is about broken CamelCase auto-linking feature when non-ASCII characters are used in the link name.

00SS commented 5 years ago

I can confirm that the following do not become CamelCase as @mich-mike is expecting: CamelCaže CařamelCase $:/Cčase

mdrda commented 5 years ago

Correct example. Thanks for the third one @OOSS, that one is weird indeed and should not be matching - illustrates things are not so obvious. Seems there is no naive solution and a l10n library would be needed for recognition among accented letters and special characters. Or am I wrong and a simple blacklisting of some letters from ability to form CamelCase should be enough to solve this? Sidenote: languages using declension are also an interesting topic. It would be correct in Czech language that e.g. "MalýPes" (base form standing for [there is a]"SmallDog") would link to the same tiddler as "MaléhoPsa" (meaning: [I can see a]"SmallDog"). Using the base form in the latter context is grammatically incorrect. To be honest I don't know a wiki engine which would implement this. It would definitely be a wow feature (not in the US, England, France, Germany...) still there is many of us speaking languages featuring accents and declension ;-) In fact it is a cool language feature which hints the reader/listener about the context of information being provided by means of grammer. Sometimes the explicit context does not have to be provided because of the implicit context implying from the grammer. It also can provide error check of the information provided. Since declension is obligatory in the language, ommitting it would raise information consistency error (or just a grammer error :-)

mdrda commented 5 years ago

I found the CamelCase implementation has been challenged already: https://github.com/Jermolene/TiddlyWiki5/issues/337

sukima commented 5 years ago

It isn't just unicode; numbers also are an issue. I expected Foo123Bar to linkify but alas it does not. TBH I found it just easier to not rely on the CamelCase feature and simply surround my links with [[ ]] anyway. This has the added advantage of capturing spaces (which my tiddlers look nicer when they have spaces anyway).

00SS commented 5 years ago

@michal-mike

The 3rd example is called a syslink (System Link) which is enabled by default, and can be switched off in the ControlPanel >> Info >> Advanced >> Parsing

It turns text starting with $:/x into a link (where x is a string of characters)

As for differently named links opening the same tiddler, Mario has made a uni-link plugin that does this using I believe aliases. I have not used it, but many others do.

From issue #337 :

I think it makes sense for the wikilink rules to be on the conservative side, as it is easier to explicitly link a text than it is to suppress a wiki link.

I switch off CamelCase linking in my wikis, and make all my links [[manually]]. Perfect control!

pmario commented 5 years ago

.... I switch off CamelCase linking in my wikis, and make all my links [[manually]]. Perfect control!

I think, this is the right approach, because we can't deal with all possible uni-code characters. There are 65000 of them.

mdrda commented 5 years ago

@00ss, @sukima: manual control over wikilinks seems to be the best answer to my concern, thanks @pmario: uni-link plug-in for topic name aliasing does solve my concern over declension. Very elegant imo. Nothing left to say other than thanks for your inputs? One thing maybe:

00SS commented 5 years ago

Just in case you are not aware, CamelCase is turned off in the ControlPanel >> Settings It is the first setting at the top, and you have to save and reload your TiddlyWiki for it to take effect. If you feel your issue is solved or unsolvable, do consider closing this issue. Hope you're also aware of https://groups.google.com/forum/#!forum/tiddlywiki

mdrda commented 5 years ago

It has been mostly resolved by workarounds availability based on manual control. Remaining part is the last one mentioned:

The first CamelCase expression produces an inconsistent wikilink broken after "a" considering "ř" a delimiter such as space, which seems inconsistent. Second expression does not produce wikilink which appears more consistent; expected because of presence of accented chars Expected outcome: either both of these cases of CamelCase composed words are not treated as wikilinks or both are (unlikely the case because of the explanations given above). Actual outcome: CamelCase acting strange The only true workaround is turning off CamelCase which is defeating its purpose.

00SS commented 5 years ago

I believe CamelCase is still enabled by default in TiddlyWiki as a remnant from early days. Discussion on having it on/off may lead to disabling it by default.

Your present issue and the below statement might be another reason to do so.

I think, this is the right approach, because we can't deal with all possible uni-code characters. There are 65000 of them.

Also ...

Wiki link markup

Camel case is used in some wiki markup languages for terms that should be automatically linked to other wiki pages. This convention was originally used in Ward Cunningham's original wiki software, WikiWikiWeb, and can be activated in most other wikis. Some wiki engines such as TiddlyWiki, Trac and PmWiki make use of it in the default settings, but usually also provide a configuration mechanism or plugin to disable it. Wikipedia formerly used camel case linking as well, but switched to explicit link markup using square brackets and many other wiki sites have done the same.

Personally I feel that even syslinks should be off by default.