Open 1313ou opened 2 weeks ago
Thanks for this, I like the idea, but I wonder if the choice of characters are the best to implement this.
Introducing a lot of non-ASCII characters can cause issues, in particular in that I am still not 100% sure how well legacy apps that use the WNDB format can cope with these characters, so I would like to test it a bit more.
Secondly, if we do use Unicode characters for punctuation, wouldn't it be more appropriate to use U+2018 and U+2019 for quotes, rather then U+00B4 which is officially called "Acute Accent"
I don't think non-ascii will pose a problem nowadays. All modern languages and libraries will handle this seamlessly. The legacy applications that will stumble on non-ascii characters are likely to stumble on :
`¬, °, ·, ×, ⁓, −, ∞, ̃, €, ½, á, à, ä, ā, ç, é, É, ê, ë, fi, ʰ, ʻ, í, Ḳ, ñ, ó, ò, ö, ő, ś, š, ú, ü, ű, ū, α, β, γ, ρ, ъ, Ъ, ь, Ь,
not to mention the em-dash and ellipsis, all of which have already been imported in OEWN.
As for the choice of quoting characters, I agree with you that ‘quoted’ with ‘ (u2018) and ’ (u2019) for quotes would be more appropriate. Or “quoted” with “ (u201C) and ” (u201D) double quotation marks.
Actually that was my first choice but I fell back on the "Acute Accent" (u00B4) because
If you are open to the u2018-u2019 move (yielding ‘quoted’), so am I. I can easily adjust the PR to do just this.
I would also prefer “ (u201C) and ” (u201D), as people are less likely to confuse them with apostrophes.
On Thu, 4 Jul 2024 at 09:17, Bernard Bou @.***> wrote:
I don't think non-ascii will pose a problem nowadays. All modern languages and libraries will handle this seamlessly. The legacy applications that will stumble on non-ascii characters are likely to stumble on :
`¬, °, ·, ×, ⁓, −, ∞, ̃, €, ½, á, à, ä, ā, ç, é, É, ê, ë, fi, ʰ, ʻ, í, Ḳ, ñ, ó, ò, ö, ő, ś, š, ú, ü, ű, ū, α, β, γ, ρ, ъ, Ъ, ь, Ь,
not to mention the em-dash and ellipsis, all of which have already been imported in OEWN.
As for the choice of quoting characters, I agree with you that ‘quoted’ with ‘ (u2018) and ’ (u2019) for quotes would be more appropriate. Or “quoted” with “ (u201C) and ” (u201D) double quotation marks.
Actually that was my first choice but I fell back on the "Acute Accent" (u00B4) because
- it is Extended Ascii, coded on one byte
- the move is less extensive: you just replace the closing mark
- it is more conservative: the backtick stays in place so that code that spots quotations with this will still work
- the backtick for opening is actually the "Grave Accent" and the acute accent and grave accent are in a mirror relation.
If you are open to the u2018-u2019 move (yielding ‘quoted’), so am I. I can easily adjust the PR to do just this.
— Reply to this email directly, view it on GitHub https://github.com/globalwordnet/english-wordnet/pull/1026#issuecomment-2208280998, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRSMVMWRZZQO64XIJO3ZKTZIPAVCNFSM6AAAAABKEDKUFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBYGI4DAOJZHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Francis Bond https://fcbond.github.io/
Apostrophe used as a mark to close a quotation is ambiguous and makes quotations very difficult to parse. Take for example:
Instead of multiplexing the apostrophe character, I suggest using a dedicated character (´) to close quotations. It's ASCII (0x00B4) and mirrors the backtick/grave accent (`).
Putting an end to this multiplexing requires sorting current uses of the apostrophe into 1) omission of character (elision, contraction, possessive ...) and 2) quotation ending. This is what is done here and thus affects only the latter use.
This change is easily reversible by automatic character substitution.
It opens the way to other quotation schemes (by automatic character substitution):
‟double quoted” “double quoted” „double quoted low” ❛heavy quoted❜ ❟heavy quoted low❜ ❝heavy double quoted❞ ❠heavy double quoted low❠ «guillemet»
Added to that the YAML is simpler: fewer are the instances where apostrophes in YAML have to be escaped.