Lotus-King-Research / Requests

Common repository for RFCs
0 stars 0 forks source link

[RFC0002] Custom Dictionaries #2

Open mikkokotila opened 2 years ago

mikkokotila commented 2 years ago

What is it?

The ability to add a custom dictionary in an expected format, and have that dictionary included in your dictionary searches.

How would it work?

Organization or Team adds a custom dictionary from settings, that dictionary becomes part of the set of dictionaries where searches are made.

This is slightly related to Lotus-King-Research/Requests#1.

Why is it important?

At the moment all translators and translation teams have some form of dictionary on their own part, and then, in addition, use other dictionaries. The priority is in using the words from the own dictionary, and using words from other dictionaries is secondary.

Specification

Changes to views

blahmonkey commented 2 years ago

What is the current format used by the backend, and would this custom dictionary be easily available in said format?

mikkokotila commented 2 years ago

At the moment the dictionaries are built from this file in Multi-Dictionary repo. That file is then cleaned into a three-column CSV. So "out-of-the-box" there is functionality to ingest three-column CSV where the columns are:

['word', 'meaning', 'source'].

blahmonkey commented 2 years ago

Okay, so I guess new dictionaries may need to be parsed to bring them into that format since it appears a bit custom to me (correct me if Im wrong here)

mikkokotila commented 2 years ago

Basically the way it works now is that when there is commit to Multi-Dictionary then the data gets updated https://multi-dictionary-data.padma.io/ which is where Tibetan-Lookup gets it upon being restarted. Padma-Backend uses Tibetan-Lookup for /dictionary_lookup endpoint, so basically changes in Multi-Dictionary become visible when Padma-Backend is reloaded.

4 introduces a more meaningful way to handle dictionaries. At which point Multi-Dictionary can be structured so that each dictionary is there as separate CSV, and there is folder in which #4 will look for updates frequently and update the DB accordingly.