UAlbertaALTLab / morphodict

Plains Cree Intelligent Dictionary
https://itwewina.altlab.app/
Apache License 2.0
21 stars 12 forks source link

Entry heads should preserve ý when present #1181

Open fbanados opened 1 week ago

fbanados commented 1 week ago

Currently, crk-db, lang-crk, and morphodict disregard ý characters in entries. We want to restore the original entries that had ý characters, but allowing the flexibility to restore the current behaviour (ý -> y) as an option for users.

This issue depends on UAlbertaALTLab/crk-db#115, once the FSTs are fixed, we can separately implement in morphodict a presentation option toggle to either show ý or y.

fbanados commented 1 week ago

Implemented draft behaviour in dev: Screenshot 2024-07-05 at 10 48 08 AM Screenshot 2024-07-05 at 10 48 15 AM Screenshot 2024-07-05 at 10 48 26 AM Screenshot 2024-07-05 at 10 48 34 AM

fbanados commented 1 week ago

Noticed a small bug in that the default selection does not appear highlighted, so it's hard to know what the presentation is until you click on an orthography. This should be changed to always show the current orthography highlighted.

fbanados commented 1 week ago

Also the current implementation of the unified frontend makes it impossible to have distinct orthographies in each deployment, that means that, although unused, the javascript code must be aware of all orthographies in all languages. Thus inclusion of new orthographies that have new codes requires both modifying the settings.py for a particular sssttt deployment, and the frontend.js list of options. It should be the case that only one place is required. Simplest change would be to autogenerate part of the javascript file in python by collecting all possible orthographies from every language. Then the introduction of a new orthography is isolated in the code.

This would be a stepping stone towards three refactoring options:

  1. That each sssttt deployment can have their own individual frontend options (e.g. css options and corresponding javascript to implement frontend behaviour)
  2. Orthographical changes that can be easily encoded in character mappings or javascript libraries and do not require an FST (currently, all of them) are computed at the client side instead of the server side. Biggest benefit of this approach could be to reduce the size of http requests.
  3. That orthographies become a linguist-guided design option that does not require any coding: A configuration file could be used to autogenerate the parts of the frontend (and backend, depending on whether refactoring 2 is deployed) that are required to present multiple orthographies.