MichaelFenwick / WomensScriptTransliterator

A project to transliterate text from English to the Women's Script featured in the Stormlight Archive novels.
0 stars 0 forks source link

Identify and process single quote pairs in sentences. #11

Open MichaelFenwick opened 3 years ago

MichaelFenwick commented 3 years ago

Single quotes need to be identified and isolated from apostrophes. They can be replaced with the appropriate Unicode single quote characters, while apostrophes can remain the apostrophe character (or changed from closing single quote to apostrophe if it started as one).

MichaelFenwick commented 2 months ago

Some thoughts on how to handle this:

It's too hard to identify apostrophes directly, letting anything that remains be considered a single quote. Beyond plural possessives (the boys' game) looking like the end of a single quoted sequence, apostrophes are also used for indicating accents (stormin', ol') in a way that isn't readily identifiable, and which can't reliably be expected to match a dictionary entry. As such, unambiguous punctuation needs to be removed from the sentence, and then some logic can be used to predict whether what remains is an apostrophe or closing single quote (CSQ).