Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Shared Senses #1003

Open susanodd opened 1 year ago

susanodd commented 1 year ago

On the master branch, there are currently no senses yet.

On the senses branch (pull request #1002) there is functionality that "shares senses" between glosses.

This is basically done via AI in that it happens automatically.

For a gloss, if the user enters new keywords for a sense for a language, then the software looks for other already existing sense translations that match and uses that sense translation object instead of creating a new one for this gloss. The newly added sense translation thus shares its new keywords with another gloss for this language. (At the moment, this is buggy for English, because the AI can find sense translations in other datasets.)

This seems that unintended links can arise between glosses since this is done by AI. (I call it AI because the software does it, not the user.)

An alternative would be to introduce relation objects as for the other kinds of relations to relate the sense objects of different glosses. And with that user functionality to instruct Signbank to link the senses.

vanlummelhuizen commented 1 year ago

(I call it AI because the software does it, not the user.)

@susanodd I think we should not call this AI. It is just a matter of database structure and the fact that Keyword objects are reused if they exactly match the sense translation a user fills in. Nothing more, nothing less.


An alternative would be to introduce relation objects as for the other kinds of relations to relate the sense objects of different glosses. And with that user functionality to instruct Signbank to link the senses.

Now, glosses are indeed implicitly linked if they share a Keyword through a sense. Your proposal is to link glosses explicitly by the user. That could be a good idea, if users find that useful. Perhaps @ocrasborn should say something about this.

susanodd commented 1 year ago

Yes, it's not AI. Put all the cool kids use AI. :)

susanodd commented 1 year ago

I already wrote this in the pull comments.

I don't think the solution for sharing as it is now can stay. The "shared senses" literally share SenseTranslation objects and the Translation objects stored in them. However, the Translation objects have a link to a gloss inside them. They can't be shared by multiple glosses.

The Translation objects allow the Keyword objects to be multilingual. Keyword objects are basically just strings. This also allows different languages to reuse Keyword objects. And different Languages to include the same keywords, but the Keyword is only stored once. The translations are used for the senses to make them know what language a keyword is. There would be no way to know whether the keywords were even all in the same language if only keywords were used. We can know which language(s) a keyword is associated with. But this is different than which dataset(s) it is associated with. That comes from the gloss. Gloss => Lemma => Dataset

ocrasborn commented 1 year ago

@vanlummelhuizen Now, glosses are indeed implicitly linked if they share a Keyword through a sense. Your proposal is to link glosses explicitly by the user. That could be a good idea, if users find that useful. Perhaps @ocrasborn should say something about this.

Let's first start using the initial functionality (once it's available) for a month or two, then see whether we want to explicitly link glosses.

vanlummelhuizen commented 1 year ago

@susanodd I carefully read some of the pieces of code you guarded with if settings.SHARE_SENSES:.

It seems that sometimes a shared sense means: a Sense that was already linked to another Gloss in the same Dataset: https://github.com/Signbank/Global-signbank/blob/5c44873d65d4df53b32b7327b15e44e4503438f3/signbank/dictionary/update.py#L550-L562

Sometimes a shared sense means: a Translation that was already linked to another Sense in the same Dataset: https://github.com/Signbank/Global-signbank/blob/5c44873d65d4df53b32b7327b15e44e4503438f3/signbank/dictionary/update.py#L572-L584

So, the fact shared senses could mean multiple things seems problematic. The good thing is that is all kept within one Dataset.

I think we should discuss with @Jetske and perhaps also @Woseseltops .

susanodd commented 1 year ago

Yes, the commands to map the translations to senses, if SHARE_SENSES is turned off, and you use the following commands, then neither of those two cases has been created.

  1. python bin/develop.py delete_empty_translations tstMH
  2. python bin/develop.py translations_to_senses tstMH

I added a consistency check on GlossDetailView when the object is first fetched so the developer can check if the senses for the gloss have become inconsistent. Before I added the SHARE_SENSES check, that was happening all the time. Partly because in testing stuff we keep using keywords like "test". :) And those empty keywords are linked to all glosses, via Translations for the gloss and language.

I fixed the code that was fetching Translations from totally different datasets. Recall your fix of adding the gloss=gloss to it.


I suspect this code is part of the bug mentioned above (relevant to #1002):

https://github.com/Signbank/Global-signbank/blob/05e85eae014814cd5490ede3e5740e1e26f15e89/signbank/dictionary/update.py#L412-L416

and farther down, the same:

https://github.com/Signbank/Global-signbank/blob/05e85eae014814cd5490ede3e5740e1e26f15e89/signbank/dictionary/update.py#L460-L464

and another:

https://github.com/Signbank/Global-signbank/blob/05e85eae014814cd5490ede3e5740e1e26f15e89/signbank/dictionary/update.py#L538-L542

This looks iffy because the Translation objects are related to glosses and hence (other) datasets.

Originally posted by @susanodd in https://github.com/Signbank/Global-signbank/issues/965#issuecomment-1612626250

susanodd commented 1 year ago

UPDATE The senses are live now. We applied the migration without sharing to start, since the users need to split keywords into different senses when applicable. @ocrasborn estimated this applies to roughly 1/3 of the NGT glosses.

Here is an example where there are numerous variants of a gloss with various keywords.

https://signbank.cls.ru.nl/dictionary/gloss_relations/2284

even-denken-a-relations-top even-denken-a-relations-bot

Jetske commented 1 year ago

image

Currently, senses are not shared by multiple glosses, but I added an icon to show senses that are similar with a link to their gloss.

A button can be added to this modal to explicitly choose to share one of those senses, if a suitable match is found. But I was thinking: if you choose to share a sense, that means that its example sentences will also be shared. But then an example sentence video may instead contain the sign from another (synonym?) gloss, e.g. BRUS-C instead of BRUS-B. Is this a problem? @ocrasborn

So additionally, if we eventually do want to share senses, a foreignkey field to gloss could be added into the examplesentence model, such that it can be made clear (in another color, for example, or not show them at all) in case the example sentence (video) was originally added for another gloss. Or even to multiple glosses, as a sentence likely contains more than one sign.

susanodd commented 1 year ago

@Jetske that looks like a good idea. Can you look at the #1019 CSV for example sentences? It's a bit of an obstacle because for exporting (for a gloss row in the csv) the sentences need to refer to the sense they belong to. So I've prefaced them with the Sense Number. (Of the gloss that is the row. Otherwise there would need to be Sense IDs used.)

If you imagine importing sentences, how to do this? They need to be attached to something. As the CSV Update is set up now, each gloss has a row. But obviously, if the user is creating senses and importing sentences in the same file, this is kind of elaborate to implement since senses would need to be created in order to attach sentences to them.

susanodd commented 12 months ago

@Jetske I put the Similar Senses code live.

One thing that is a bit noticeable, now that there are numerous modals, it seems to take ages to load the Gloss Detail View page. The methods in the modals are applied at page load.

There seem to be many calls to methods to calculate data to display, rather than compute it in the context method in python.

Jetske commented 12 months ago

@susanodd oh that's not great.. Would it be quicker to compute in the context method, or is it possible to apply methods when opening a modal?

susanodd commented 12 months ago

I had a huge problem with the minimal pairs. They are generated via a separate ajax call now. But those are very different from the senses. It previously did not work to put the modals in a separate file (to load it in order to improve readability) because the Django translations (trans) do not work on the loaded files. I have no idea if a new bootstrap Django combi would help. When I complained about this, @Woseseltops said I should try to move things to python instead.

Yes, if things can be computed in the context in python and passed as variables it's much more efficient. You can also compute very complex things that way. (The keywords mapping computes a large dictionary to look up everything in the template, including matrix dimensions.)

If you turn on DEBUG and look in the browser, you can sometimes see that a query has not even been evaluated yet! That's how we discovered that the minimal pairs were taking an extreme amount of time. (Years ago, there was a Gloss model method for minimal pairs that was called in the detail view. That has been rewritten to an ajax call as it is now...)

Django delays as long as possible the evaluation of the queries. (That's why I often flatten them out to force it to evaluate them already.) (I'm not actually sure about how it evaluates the _set things.) I know the code ends up ugly without the use of elegant methods in the template. (As you have done it elegantly with methods.) But I suspect it might be choking with all the complex string operations.

susanodd commented 11 months ago

@Jetske I'm gradually debugging the gloss detail view to figure out what is wrong with it. I discovered that the imported tagging module (requirements) has duplicate migrations (in the code that is imported itself). That might be what is going on because Django thinks tagging migrations have not been applied. (Not sure about this.)

To assist in debugging and to check parsing, I have simplified (locally) and just made a has_edit_permission context variable instead of all the perms checks that use guardian. (Since it looks like guardian might be glitchy. Since the glyphs on the live server are missing, maybe something else is not installed. No idea.)

But one problem with the Senses in the use of forloop counter in the template. This is generated live on rendering the template.

I tried using this construction in the Keyword Mapping many months ago only to discover it basically didn't work. As soon as you "do stuff" this information is actually lost. Originally you reloaded the gloss detail view, but @vanlummelhuizen insisted that the editbe added to the url. But that actually won't work with the forloop counter. (!!) The counter only works if the template is generated again (!!)

susanodd commented 11 months ago

@Jetske somehow the span is set to display:none.

<span class="sense-icon" style="display: none;"><a href="/dictionary/update/sortsense/44313/1/down" style="color:red;">▼</a></span>

There are in fact 4 senses for this gloss and that is the first one. None of the up down arrows show.

This is tstMH. I'm trying to debug gloss detail view there since it doesn't work for me on NGT.

susanodd commented 11 months ago

@Jetske this method called in GlossDetailView is causing problems:

sense.get_senses_with_similar_sensetranslations_dict

This is a query done in the template, and it is likely O(n2) on all other glosses and all other senses. It is being done when loading the template. At the very least, it should at least be in the context in Python, not in the template.

susanodd commented 11 months ago

@Jetske I removed numerous spaces from the template element "id" strings. These cause browser problems.

susanodd commented 11 months ago

I'll try moving that similar senses to a context variable. Since NGT gloss detail is not working at all (I suspect the server times out or something because it works on tstMH. But there's only 33 glosses or so.) I'm trying to determine what exactly the problem is.

susanodd commented 11 months ago

The revisions are on master now. I modified template ids and paths to remove spaces from them. I suspect this causes some problems on Safari. I moved the similar senses method to the context. There's a debug setting to print this now if DEBUG_SENSES is set.

susanodd commented 11 months ago

The similar senses method ought to be written as an SQL query as for minimal pairs so it’s not an operation on strings.

susanodd commented 11 months ago

@Jetske do this on your local copy:


>     def get_senses_with_similar_sensetranslations_dict(self):
>         "Return a list of senses with sensetranslations that have the same string as this sense"
>         similar_senses = []
>         for sense in Sense.objects.all().filter(glosssense__gloss__lemma__dataset = self.get_dataset()):
>             sensetranslations = {}
>             if sense != self:
>                 print(sense)

For NGT this loop iterates over (senses of) ALL glosses. It's quite terrifying.

There needs to be additional filtering before the loop starts. If you can, first identify which sense translations match, then work backwards to obtain the sense and other gloss in the dataset.

susanodd commented 11 months ago

Here are two examples of what the try command does. It is meant for developers. It is based on an old @Woseseltops url that was for testing videos of a single gloss, and subsequently for finding lemmas with multiple glosses. I modified it to take the gloss id as a parameter. One needs to be either superuser or staff. It shows a restricted form of detail view for senses and is intended to assist in determining what Django does between the request and the actual display. Ideally, useful when using the debug toolbar on a local computer.

try-1401

try-752

susanodd commented 11 months ago

try-752-modal-similar

susanodd commented 11 months ago

@ocrasborn now it's working as expected! Can you try out the Shared Senses and see if it's doing what you expect it to be doing? For some senses it finds many other senses.

ocrasborn commented 11 months ago

@susanodd This is mega useful, we should have had this years ago -- that is, the pop up window when you click on the chains for related senses in other signs. It would have helped so much in keeping the database clean and systematic. :-) Great that it'll be available to the Amsterdam team when they get started this autumn. Can't find anything that's unexpected/wrong now, so I'll close the issue.

ocrasborn commented 11 months ago

Ah, I'm not closing it because I have a small layout request: the chains icon takes up a lot of vertical space the way it is placed now, @susanodd , on it's own line. Could you move it to either before or after the sense, on the same line or at the same height? Not sure whether to the left or to the right would be more sensible, perhaps the latter.

susanodd commented 11 months ago

This command was created by @Jetske. Originally this is what was done on mapping the keywords to (shared) senses. But that functionality has been split so the user can observe what was found instead of doing it automatically.

This was causing the slowness the past week.

Some future enhancement would include the facility to find similar senses across different datasets. (@vanlummelhuizen has ideas about that.)

I'll see what I can do about the formatting.

Jetske commented 11 months ago

Ah, I'm not closing it because I have a small layout request: the chains icon takes up a lot of vertical space the way it is placed now, @susanodd , on it's own line. Could you move it to either before or after the sense, on the same line or at the same height? Not sure whether to the left or to the right would be more sensible, perhaps the latter.

I made the layout more compact in pull request #1049 including placement of the chains icon to the right of the sense

susanodd commented 10 months ago

The Similar Senses link may need to explicitly refer to (show) the Annotation (Signbank ID) of the similar sense rather than the Lemma. For some of the glosses in tstMH, it looks like they refer to themselves as a similar gloss.

gloss_tstMH_36429

gloss_tstMH_36429_refers_to_self_as_similar

(It is the same gloss it's referring to as revealed by hovering.)

susanodd commented 10 months ago

The bug shown above arises here:

https://github.com/Signbank/Global-signbank/blob/405594ee57b414f025c5826e50597b1cf40ba6c1/signbank/dictionary/models.py#L807

The Translation objects include a field gloss. But this method is inside of Sense and the gloss is unknown. However, it is known in the Gloss Detail View where the method is called. In the above example, all of the Translation objects retrieved by this query are for the gloss we are looking at, so should be excluded. The method that includes that query is: https://github.com/Signbank/Global-signbank/blob/405594ee57b414f025c5826e50597b1cf40ba6c1/signbank/dictionary/models.py#L802 and is called by the get_context_data method of the Gloss Detail View, so the gloss can be made available as a parameter. Although it actually looks more like a function than a method then (so maybe doesn't belong inside of models.py).

susanodd commented 10 months ago

Okay, I figured out what is going wrong. There was something messed up with the dictionary of similar senses and the Detail View was computing a wrong href as the link, by putting the sense id into the url as well as the gloss id. This was then being carried around by the urls. (The Gloss Detail View makes use of a shorthand url that only includes the gloss id. But that was being concatenated with the sense id with an extra slash, which is normally ignored by Django, it seems.)

susanodd commented 10 months ago

I revised the similar senses method and added a test to show that it works as expected.