DraqueT / PolyGlot

PolyGlot is a conlang construction toolkit.
MIT License
389 stars 44 forks source link

Issues with the Unicode Supplemental Multilingual Plane #1158

Closed tanadrin closed 2 years ago

tanadrin commented 2 years ago

When attempting to use Unicode characters from the Supplemental Multilingual Plane in the Lexicon, at least from the Gothic alphabet (šŒ°šŒ±šŒ²šŒ³ etc., codepoints U+10330ā€“U+1034F), in the "Conlang Word" field for an entry, an error appears. For example, the input šŒ³šŒ°šŒ²šƒ results in the error "Word: "šŒ³šŒ°šŒ²šƒ" with local value: "day" cannot be rendered properly using font: Ulfilas."

This error appears regardless of the conlang font used and whether or not it supports these characters, regardless of whether or not a fallback font exists and the application displays it correctly (in all tested examples, the application displayed the characters correctly using fallback fonts or the chosen conlang font even in the error message itself), and even though no actual error in rendering seems to be occurring. Even the auto-declension feature is working normally. However, the presence of an error prevents adding additional entries to the Lexicon.

I have reproduced this error with Old Turkic characters from the SMP, using the Segoe UI Symbol font which supports them; however, I don't have fonts that cover most of the SMP. I was unable to find any characters in the BMP that exhibited similar behavior.

Screenshot attached.

possible-bug

DraqueT commented 2 years ago

Thank you for the heads up here! This is a new one, but I'll see what I can do about addressing it.

In the mean time, you can get around the block by clicking Override Lexical Rules. I'm not certain how many words this bug will hit, but doing that will allow you to move forward adding additional vocabulary until I'm able to figure out what's causing the font check to fail.

DraqueT commented 2 years ago

Could you please send me the language file in question? I am having trouble recreating this, and I am wondering whether there's some edge case that is more complex than the string "šŒ³šŒ°šŒ²šƒ" by itself. My email is draquemail@gmail.com and you can toss it there as an attachment.

DraqueT commented 2 years ago

I am closing this ticket as I am not able to reproduce or get a copy of the broken language file, but if you continue to have trouble with this or can send it to me, I will reopen.