Shared Senses - Githubissues

susanodd commented 1 year ago

On the master branch, there are currently no senses yet.

On the senses branch (pull request #1002) there is functionality that "shares senses" between glosses.

This is basically done via AI in that it happens automatically.

For a gloss, if the user enters new keywords for a sense for a language, then the software looks for other already existing sense translations that match and uses that sense translation object instead of creating a new one for this gloss. The newly added sense translation thus shares its new keywords with another gloss for this language. (At the moment, this is buggy for English, because the AI can find sense translations in other datasets.)

This seems that unintended links can arise between glosses since this is done by AI. (I call it AI because the software does it, not the user.)

An alternative would be to introduce relation objects as for the other kinds of relations to relate the sense objects of different glosses. And with that user functionality to instruct Signbank to link the senses.

vanlummelhuizen commented 1 year ago

(I call it AI because the software does it, not the user.)

@susanodd I think we should not call this AI. It is just a matter of database structure and the fact that Keyword objects are reused if they exactly match the sense translation a user fills in. Nothing more, nothing less.

An alternative would be to introduce relation objects as for the other kinds of relations to relate the sense objects of different glosses. And with that user functionality to instruct Signbank to link the senses.

Now, glosses are indeed implicitly linked if they share a Keyword through a sense. Your proposal is to link glosses explicitly by the user. That could be a good idea, if users find that useful. Perhaps @ocrasborn should say something about this.

susanodd commented 1 year ago

Yes, it's not AI. Put all the cool kids use AI. :)

susanodd commented 1 year ago

I already wrote this in the pull comments.

I don't think the solution for sharing as it is now can stay. The "shared senses" literally share SenseTranslation objects and the Translation objects stored in them. However, the Translation objects have a link to a gloss inside them. They can't be shared by multiple glosses.

The Translation objects allow the Keyword objects to be multilingual. Keyword objects are basically just strings. This also allows different languages to reuse Keyword objects. And different Languages to include the same keywords, but the Keyword is only stored once. The translations are used for the senses to make them know what language a keyword is. There would be no way to know whether the keywords were even all in the same language if only keywords were used. We can know which language(s) a keyword is associated with. But this is different than which dataset(s) it is associated with. That comes from the gloss. Gloss => Lemma => Dataset

ocrasborn commented 1 year ago

@vanlummelhuizen Now, glosses are indeed implicitly linked if they share a Keyword through a sense. Your proposal is to link glosses explicitly by the user. That could be a good idea, if users find that useful. Perhaps @ocrasborn should say something about this.

Let's first start using the initial functionality (once it's available) for a month or two, then see whether we want to explicitly link glosses.

vanlummelhuizen commented 1 year ago

@susanodd I carefully read some of the pieces of code you guarded with if settings.SHARE_SENSES:.

It seems that sometimes a shared sense means: a Sense that was already linked to another Gloss in the same Dataset: https://github.com/Signbank/Global-signbank/blob/5c44873d65d4df53b32b7327b15e44e4503438f3/signbank/dictionary/update.py#L550-L562

Sometimes a shared sense means: a Translation that was already linked to another Sense in the same Dataset: https://github.com/Signbank/Global-signbank/blob/5c44873d65d4df53b32b7327b15e44e4503438f3/signbank/dictionary/update.py#L572-L584

So, the fact shared senses could mean multiple things seems problematic. The good thing is that is all kept within one Dataset.

I think we should discuss with @Jetske and perhaps also @Woseseltops .

susanodd commented 1 year ago

Yes, the commands to map the translations to senses, if SHARE_SENSES is turned off, and you use the following commands, then neither of those two cases has been created.

python bin/develop.py delete_empty_translations tstMH
python bin/develop.py translations_to_senses tstMH

I added a consistency check on GlossDetailView when the object is first fetched so the developer can check if the senses for the gloss have become inconsistent. Before I added the SHARE_SENSES check, that was happening all the time. Partly because in testing stuff we keep using keywords like "test". :) And those empty keywords are linked to all glosses, via Translations for the gloss and language.

I fixed the code that was fetching Translations from totally different datasets. Recall your fix of adding the gloss=gloss to it.

I suspect this code is part of the bug mentioned above (relevant to #1002):

https://github.com/Signbank/Global-signbank/blob/05e85eae014814cd5490ede3e5740e1e26f15e89/signbank/dictionary/update.py#L412-L416

and farther down, the same:

https://github.com/Signbank/Global-signbank/blob/05e85eae014814cd5490ede3e5740e1e26f15e89/signbank/dictionary/update.py#L460-L464

and another:

https://github.com/Signbank/Global-signbank/blob/05e85eae014814cd5490ede3e5740e1e26f15e89/signbank/dictionary/update.py#L538-L542

This looks iffy because the Translation objects are related to glosses and hence (other) datasets.

Originally posted by @susanodd in https://github.com/Signbank/Global-signbank/issues/965#issuecomment-1612626250

susanodd commented 1 year ago

UPDATE The senses are live now. We applied the migration without sharing to start, since the users need to split keywords into different senses when applicable. @ocrasborn estimated this applies to roughly 1/3 of the NGT glosses.

Here is an example where there are numerous variants of a gloss with various keywords.

https://signbank.cls.ru.nl/dictionary/gloss_relations/2284

even-denken-a-relations-top even-denken-a-relations-bot

Jetske commented 1 year ago

Currently, senses are not shared by multiple glosses, but I added an icon to show senses that are similar with a link to their gloss.

A button can be added to this modal to explicitly choose to share one of those senses, if a suitable match is found. But I was thinking: if you choose to share a sense, that means that its example sentences will also be shared. But then an example sentence video may instead contain the sign from another (synonym?) gloss, e.g. BRUS-C instead of BRUS-B. Is this a problem? @ocrasborn

So additionally, if we eventually do want to share senses, a foreignkey field to gloss could be added into the examplesentence model, such that it can be made clear (in another color, for example, or not show them at all) in case the example sentence (video) was originally added for another gloss. Or even to multiple glosses, as a sentence likely contains more than one sign.

susanodd commented 1 year ago

@Jetske that looks like a good idea. Can you look at the #1019 CSV for example sentences? It's a bit of an obstacle because for exporting (for a gloss row in the csv) the sentences need to refer to the sense they belong to. So I've prefaced them with the Sense Number. (Of the gloss that is the row. Otherwise there would need to be Sense IDs used.)

If you imagine importing sentences, how to do this? They need to be attached to something. As the CSV Update is set up now, each gloss has a row. But obviously, if the user is creating senses and importing sentences in the same file, this is kind of elaborate to implement since senses would need to be created in order to attach sentences to them.

susanodd commented 12 months ago

@Jetske I put the Similar Senses code live.

One thing that is a bit noticeable, now that there are numerous modals, it seems to take ages to load the Gloss Detail View page. The methods in the modals are applied at page load.

There seem to be many calls to methods to calculate data to display, rather than compute it in the context method in python.

Jetske commented 12 months ago

@susanodd oh that's not great.. Would it be quicker to compute in the context method, or is it possible to apply methods when opening a modal?

susanodd commented 12 months ago

I had a huge problem with the minimal pairs. They are generated via a separate ajax call now. But those are very different from the senses. It previously did not work to put the modals in a separate file (to load it in order to improve readability) because the Django translations (trans) do not work on the loaded files. I have no idea if a new bootstrap Django combi would help. When I complained about this, @Woseseltops said I should try to move things to python instead.

Yes, if things can be computed in the context in python and passed as variables it's much more efficient. You can also compute very complex things that way. (The keywords mapping computes a large dictionary to look up everything in the template, including matrix dimensions.)

If you turn on DEBUG and look in the browser, you can sometimes see that a query has not even been evaluated yet! That's how we discovered that the minimal pairs were taking an extreme amount of time. (Years ago, there was a Gloss model method for minimal pairs that was called in the detail view. That has been rewritten to an ajax call as it is now...)

Django delays as long as possible the evaluation of the queries. (That's why I often flatten them out to force it to evaluate them already.) (I'm not actually sure about how it evaluates the _set things.) I know the code ends up ugly without the use of elegant methods in the template. (As you have done it elegantly with methods.) But I suspect it might be choking with all the complex string operations.