Closed dwhieb closed 2 years ago
@aarppe It sounds like we want to retain <ý>
for the purposes of LEXC, but itwêwina will need to use the standardized <y>
. If that's the case, I can store <ý>
versions of headwords in the fstStem
field, but store standardized <y>
versions in the lemma
field, which is intended to be used by itwêwina.
Following up on our Tech Team discussion, we'd probably need to have all four of 1) regular lemma (no <ý>, also used as the FST lemma); 2) "linguistic" lemma (with <ý>); 3) stem (with <ý>, whichever Arok provides); and 4) fststem (with <ý>, if Arok's stem isn't sufficient for FST purposes).
Then within itwêwina, we can still match the lemma from the FST with lemma in the dictionary entry, and allow for showing the linguistic lemma for anyone who wants to know the location of the dialectal <ý>.
Just to add my 2¢: we can store <ý> in itwêwina as the underlying representation, and display <y>
by default. Then we can show <ý> as an orthography option. itwêwina does not need <ý> converted to
As I'm inclined not to require <ý> in the lemmas for the FST, if we keep <ý>, we need to implement in the code its regularization to
As for showing <ý>, yes, we'd want that as an option, but I wouldn't prefer it as default due to the principle of keeping orthographical distinctions to the bare minimum necessary.
@dwhieb You do not really want to regularize
<ý>
to<y>
for Plains Cree either, when importing the data from Arok's CW into the dictionary database.Currently, we retain the
<ý>
when e.g. we create the stems in LEXC code - this is useful as that allows us to convertý -> y
for Plains Cree, andý -> {th}
for Woods Cree. I don't include<ý>
in the lemmas for now, as that might have complications in the use of the FSTs, as accessing<ý>
on most regular keyboards is not trivial.I could imagine us retaining
<ý>
in the dictionary database as well, and then having an option allowing users to select whether they want to see within itwêwina that marking or not. Arok in fact has on a few occasions been inclined to make that default behavior, but I've managed to successfully argue that I'd be better to offer it as an option - in terms of the simplicity of a regular writing system, we ought not to force users to make explicit distinctions that are primarily historical/linguistic.For the few Swampy Cree forms with
<ń>
, I'd convert those to<ý>
, but keep a note somewhere of their provenance. Or then we could keep<ń>
, but I'd need to add that to the morphophonological rewrite rules.__Originally posted by @aarppe in https://github.com/UAlbertaALTLab/dictionary-database/pull/30#discussion_r606746619__