HughP / dnj-corpus

A small corpus of a local newspaper
Other
3 stars 2 forks source link

Random brackets #33

Closed iandoug closed 6 years ago

iandoug commented 6 years ago

Hi I guess the bracket in the last line should be a tone mark, other instances of the word mostly use word-minus or nothing. Editor does not find matching closing bracket.

ʼYö ʼwo˗ dhɛɛˮ kpɔ ʼwo˗ pö: «ʼYö ʼsɔng˗ ˗më ʼö ʼü˗ ˮyɩɩ to ?» ʼyö˗ ˗yɔɔ bɔ ʼö˗ pö: «Dhiang ʼö sënnë ʼdhö kë˗ zë ꞊diö bha ˗yö n ʼgü ˮyɩɩ ˮyɩɩ ˗sü ˗dedewo.» (Mɛ ʼgbɛ ˗dhɛ ˗wo ꞊dhɛ ʼsɔng˗ ʼdhö.

nqthqn commented 6 years ago

Does the README.md tangentially addresses this?

The use of French style quote marks 〈«〉, 〈»〉 is confusing to Dan authors. That is, opening and closing quote marks appear to be used interchangeably in opening quotations. Additionally, there are quite a few cases where closing quote marks are missing. If software engineers for grammar and spelling checkers can manage, adding a function which checks for closing quote marks (of any kind), much like is done for programmers in IDEs, would benefit many new writers of minority languages.

HughP commented 6 years ago

Perhaps this should be a tone mark, as that does seem to be the logical place for one. But I am not sure which one.

iandoug commented 6 years ago

I was going to just fix that one but I see there are actually quite a lot of unbalanced brackets, and putting in a regex to fix each one (using the notorious Scientific Wild-Assed Guess method) is probably not the best way. Will ponder it a bit. See the red bits in attached. The single quotes seem particularly problematic.

screenshot_20180623_003349

HughP commented 6 years ago

Wow this is the first time seeing that screen shot... great way to visualize this across the whole corpus.