Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Update Gloss ID via API #1202

Closed rem0g closed 4 months ago

rem0g commented 6 months ago

Now we have Gloss Video Update and Gloss create API, we have to yet have Gloss Update API.

It should have those fields and linked to Gloss ID.

{ "fields": [ "Lemma ID Gloss: Dutch", "Lemma ID Gloss: English", "Annotation ID Gloss: Dutch", "Annotation ID Gloss: English", "Senses: Dutch", "Senses: English", "Annotation Instructions", "Word Class", "Handedness", "Strong Hand", "Weak Hand", "Handshape Change", "Relation Between Articulators", "Location", "Relative Orientation: Movement", "Relative Orientation: Location", "Orientation Change", "Contact Type", "Movement Shape", "Movement Direction", "Virtual Object", "Phonology Other", "Mouth Gesture", "Mouthing", "Phonetic Variation", "Repeated Movement", "Alternating Movement", "Strong Hand Letter", "Strong Hand Number", "Weak Hand Letter", "Weak Hand Number", "Weak Drop", "Weak Prop", "Semantic Field", "Derivation History", "Named Entity", "Valence", "Iconic Image", "Concepticon Concept Set", "In The Web Dictionary", "Is This A Proposed New Sign?", "Exclude From Ecv" ] }

susanodd commented 6 months ago

@rem0g where are you getting the input for all of these fields?

They need to correspond to actual possible values for the Field Choices.

The basic functionality will be part of #1162 . It still needs extensive checking on the input fields that they all exist.

Are you planning on using the "output" from the other API as input to this one? (For example, a giant dictionary with lots of empty fields?)

Is it possible to modify your function to call each "changed" field separately? (So split the API into calls for each field as a separate API call instead of a huge dictionary.)

(That is basically how the normal Gloss Detail Edit works, each field update is a separate ajax call. Otherwise the system would start choking because multiple users work at the same time. Originally, the system did not use ajax, say 8 years ago.)

The code is going to be locking the database while the entire command runs because it will need to check all the fields. It will also generate errors if there is a problem with the input. That's why the CSV Import is in two steps. First an error checking step. Then the actual updates.

The CSV Import can do the individual rows sequentially. The error checking also checks that rows do not conflict with each other. With an API this may cause problems because Signbank won't have control over the orchestration of the calls.

@Woseseltops needs to have input here. The entire database is locked because of Sqlite. So we would probably need to use MySQL instead to support individual table locking instead.

Again, depending on how you employ such an API it could have side effects on other users. That's why I keep suggesting using a cron job and putting the updates in a CSV. Or perhaps a script of API calls to be done at night.

rem0g commented 6 months ago

The input comes from html elements that corresponds to the same ones as at signbank, for example morphology contain exactly same fields and are in dutch. I dont know if that will be a problem.

If for every field there will be an API call, then you should really consider to change the database platform indeed. I have never heard of such limit before, if you can switch to MySQL it would be a wise choice. SQLite is intended for small projects with limited dataset.

For now i can create an seperate issue how to integrate CSV Import API, so I only would need to send CSV file and then await for the response. Does that sound reasonable for you?

susanodd commented 6 months ago

Extra information about updating glosses.

I advise against including the Lemma fields (these are not allowed to update in Gloss Update)

Because:

The video storage structure stores the gloss videos under:

<dataset-acronym> / <first two characters of lemma for default language of dataset> / <gloss annotation of default dataset>-<gloss ID>. <extension> You can't change the Lemma because that may result in "moving around videos to different locations" that is triggered by Lemma update. The same applies to Annotation update.

susanodd commented 6 months ago

The dataset and the gloss ID will need to be supplied as arguments.

I would suggest this to be in the url itself so Django / guardian can verify existence of these objects and fetch the gloss.

Then the permissions via #1187

susanodd commented 6 months ago

@rem0g Senses are not allowed to be updated via CSV. They can only be created if none exist yet.

Unless somebody wants to write an extremely difficult parsing routine to determine which ones need to be updated. (I worked on this for a long time. But the Sense Number is not hard-coded, it is dynamic. So it is too difficult to fasten down an arbitrary sense string and compare it to what is already in the database, since users may make small modifications to the text. Moreover, the comma is used to separate keywords in a sense in all the displays. But the comma may also appear inside keyword phrases. So this cannot be parsed unambiguously. There is some incomplete code related to this that is not called because it would require too much time to implement.)

So remove the Lemma, Annotation, and Senses from the specification above.

Woseseltops commented 6 months ago

I think updating glosses directly via the API, as @rem0g proposes, would be a more elegant solution than uploading a CSV. If there are technical limitations, then we should try to overcome them under the hood; with the 'update via CSV' functionality we are also bulk updating glosses, I can't imagine doing it via an API would be so different.

@rem0g , do you want to (1) communicate only the changes, or (2) communicate the new state of the gloss (so also include the value of unchanged fields). Option (2) is similar to uploading a changed CSV, but this is error prone (we detect all changes ourselves and then show these to the user). Option (1) would be better safer I think.

susanodd commented 6 months ago

One detail about this, the "internal representation" is different than the "human readable" form used in CSV.

It corresponds to the "name" stored inside a field choice object of the respective field type. (Each field has its own collection of field choice objects. These are multilingual.)

The "string" values in the "json" input in the API must correspond to names of "existing" field choice objects.

rem0g commented 6 months ago

I think updating glosses directly via the API, as @rem0g proposes, would be a more elegant solution than uploading a CSV. If there are technical limitations, then we should try to overcome them under the hood; with the 'update via CSV' functionality we are also bulk updating glosses, I can't imagine doing it via an API would be so different.

@rem0g , do you want to (1) communicate only the changes, or (2) communicate the new state of the gloss (so also include the value of unchanged fields). Option (2) is similar to uploading a changed CSV, but this is error prone (we detect all changes ourselves and then show these to the user). Option (1) would be better safer I think.

I would agree about the option 1) as you mentioned it is less error prone. I think it would be good to know whether a change is not accepted and what is the reason for that, for example it should be an array or there is already a similair gloss.

susanodd commented 6 months ago

The safety try-except code (also if you try to use Gloss Edit commands as urls) the try-except will either clear the field or leave it as it is to avoid crashing.

You could probably add a parameter -- what to do on error -- as an option to gracefully code it. (Leave as it is or clear it on error. Could be either if the original field is totally wrong for some reason.) Then if the command just responds with the "new state of the gloss" you can see it has not been changed. (Or has been cleared.)

(Because this is inside a transaction, then sometimes there are try-except clauses for all the "to be updated" fields. Normally if the command -- inside a multi-step csv update -- the command has already been checked for errors on "values in the fields". But there still could be problems with other things. This is why I advise against updating the annotation fields this way, because video files with change their name and then a file system permission error, etc. will cause the update to fail.)

susanodd commented 6 months ago

You can create new senses, but they need to follow the syntax with the vertical bar.

This is because the "str" representation of the Sense column should match the original value if it is not being updated.

I describe this in the context of CSV. All the "cells" of the "input CSV" should match the "original CSV" if they have not been changed. That is the reason to just delete columns that are not being updated. For Field Choices and normal fields this works fine. But because Senses have an entire model of objects (there are Sense, GlossSense, SenseTranslation, Keyword, Translation, ExampleSentence, ExampleSentenceTranslation, SenseExamplesentence, ExampleVideoHistory, ExampleVideo) So "update" of senses could potentially cause a huge number of updates to other model/object tables as well as the file system. (And this could fail.)

But creating new senses if fine. I'm not sure about "adding a translation" though. (Because they don't need to line up, but they do need to line up, there cannot be empty Translation objects or empty Keyword objects. Internally the Sense model relies heavily on strings. There are separate methods for retrieving and updating "per language" versus "per sense (number)". So this is also a potential problem. Empty strings can't be stored in the object tables. And null values also screw things up.)

susanodd commented 6 months ago

Extend update to use language code translations of field choices. (#1212 )

rem0g commented 6 months ago

Bump. Any updates on this? :)

Because of more efficient workflow the dataset on signCollect is growing fast, so it would make sense to have this API function on short term as possible. I would like to know progress.

susanodd commented 6 months ago

I've been working on the ethics assessment website this week. I'll get back to this issue. @Jetske is implementing an update for Senses. (Using the json list syntax she used for creation. The creation is live.) As of yet, there are no API tokens. (@Woseseltops is working on that.)

The list of fields in the issue description are those "after" senses that only affect the gloss itself. I will implement those.

As discussed, the Lemmas cannot be updated as such. (That is an "in person" issue inside another issue. Updating lemmas may have side effects on other glosses. It is also ambiguous. Caution is needed because the gloss's dataset is stored inside its lemma. This can also affect the directory where the video files are stored, also for other glosses. The urls you previously fetched to videos and images can become wrong.)

The annotations can cause problems to update because videos will also be moved by this. (Side effects on the file system. It would be preferrable to do this when logged in, using Gloss Edit.)

All the rest of the fields, no problem.

susanodd commented 6 months ago

Any preference on what to do with errors on non-existent e.g., Handshape fields?

(The code is kind of hellish to write because it needs to be atomic for all the mini-updates. But any of them can yield problems, errors. Should it be "all or nothing"? Or should it "do what it can and report back on fields that it can't do"? Or would you be willing to use a set-up to see if there are any errors, then a "commit changes" if it's okay?)

Looking at my new functionality in the commit, it's obvious the error handling is a problem with control flow. That's the next step. But what should it do? See above bullets.

rem0g commented 6 months ago

I think for sake of easiness just return an error and dont do any update. I can handle the error and display to the user what has to be corrected before trying to submit an update again.

susanodd commented 6 months ago

I think for sake of easiness just return an error and dont do any update. I can handle the error and display to the user what has to be corrected before trying to submit an update again.

Okay!

susanodd commented 5 months ago

I'm writing the code so that it will work with Dutch input field names and Dutch field values. This makes it more complicated to generate the error messages. There are also some general error messages about not finding the gloss, or the gloss not being in the dataset.

So far, some error messages are strings. And some are a dictionary, per field. (e.g., that the field cannot be updated, or that the value is not found.)

In order to provide error messages in Dutch, the non-model translations will need to end up in the "po" files. But mixing different kinds of translations (the model translations and the PO translations together in "one" string is not possible. Django does lazy evaluation and these things can't appear as a string in the code, at least it gives errors about this at run-time sometimes, because they may not have been evaluated yet). It's possible to do this in a dictionary because the key is a different piece of code that the value. (So it can use the different translation methods on the key versus the value.) It probably needs to be a dictionary errors structure. (I'm not sure to what extent you are going to parse error messages on your side.) Eventually, it's all just josn. But it looks like it's mixing together types in the python code. (They won't end up in the same json results dictionary. It will either be errors: string, or errors: dict.)

rem0g commented 5 months ago

We actually dont need the field names to be in dutch, the JSON is only read in dutch but english field names/values can be returned via other route.

Our code is really flexible, so no need to think about that :)

susanodd commented 5 months ago

Okay, I just modified the code so it always returns a dict for the errors json returned. I made the other checks in the errors dict be: errors: { 'Dataset' : ... 'User': ... 'Gloss': ...

Then the rest are all <field> : ....

So at least it has the same type for the errors. The update is only performed if there are no errors.

The first kinds of errors are things that don't exist or don't have permission for. (Some of these probably won't occur but it needs type checking on everything to prevent run-time errors. Kind of annoying.)

I already implemented it so it uses the translations. Then when it's deployed, will need to create/make/compile the messages and fill them in in the PO files.

Next I need to make a testing template for making sure it works. :) That's for Monday. :)

susanodd commented 5 months ago

This is on signbank-dev now. Although we're still working on it.

/dictionary/api_update_gloss/<datasetid>/<glossid>/

susanodd commented 5 months ago

@rem0g there is a bit of ambiguity in the json input. You must only include the fields you want to update. Otherwise it is unclear whether an empty field means "erase" an existing value for the field, or "ignore" the field. We had this problem with the test template where we can enter values to check the code. Now it ignores the fields that are empty and does not include them in the json data. (But likewise, they could be included and mean to erase.)

rem0g commented 5 months ago

Ok, good to know. I will adapt the code for that.

Jetske commented 5 months ago

@rem0g the update senses field is in this form: 'senses': '{"en": [["s1w1", "s1w2"],["s2w1"]], "nl":[["s1w1"],["s2w1", "s2w1"]]}'

If you prefer it to match the create gloss api (where the fields are separated by language) I think it's easier to edit that one into this dictionary form. Let me know if that is necessary.

susanodd commented 5 months ago

This is deployed on Signbank now!

The API token version is on signbank-dev only at the moment.!!

rem0g commented 4 months ago

I am getting CSRF token verifcation warning.

What is required to get in cookies or POST data to pass API token or CRSF token?

rem0g commented 4 months ago

I just noticed comment of @Jetske on wikipedia, great. Now i need API token, where can i obtain that?

rem0g commented 4 months ago
data= {
   "Annotation Instructions": ,
   "Word Class": ,
   "Handedness": ,
   "Strong Hand": ,
   "Weak Hand": ,
   "Handshape Change": ,
   "Relation Between Articulators": ,
   "Location": ,
   "Contact Type": ,
   "Movement Shape": ,
   "Movement Direction": ,
   "Repeated Movement": ,
   "Alternating Movement": ,
   "Relative Orientation: Movement": ,
   "Relative Orientation: Location": ,
   "Orientation Change": ,
   "Virtual Object": ,
   "Phonology Other": ,
   "Mouth Gesture": ,
   "Mouthing": ,
   "Phonetic Variation": ,
   "Strong Hand Letter": ,
   "Strong Hand Number": ,
   "Weak Hand Letter": ,
   "Weak Hand Number": ,
   "Weak Drop": ,
   "Weak Prop": ,
   "Semantic Field": ,
   "Derivation History": ,
   "Named Entity": ,
   "Valence": ,
   "Iconic Image": ,
   "Concepticon Concept Set": ,
   "In The Web Dictionary": ,
   "Is This A Proposed New Sign?": ,
   "Exclude From Ecv": ,
   "senses": 
   '{
       "en": [["sense 1 word 1", "sense 1 word 2"],["sense 2 word 1"]], 
       "nl":[["sense 1 word 1"],["sense 2 word 1", "sense 2 word 1"]]
    }'
}

There is some fields missing we need to update, in case we want to add -A suffix to lemma/gloss:

"Lemma ID Gloss: Dutch", "Lemma ID Gloss: English", "Annotation ID Gloss: Dutch", "Annotation ID Gloss: English",

susanodd commented 4 months ago

The method that is called by the URL is crsf token exempt.

You need to create an API token in your user profile. And save it somewhere. It is only visible once. Then put that in the headers as shown in the wiki.

You can only do this on signbank-dev at the moment because we need to make sure it works.

susanodd commented 4 months ago

If you look at the WIKI you will see an example API call for CREATE GLOSS that uses the API token. (signbank-dev)

Observe that @Jetske used two different syntaxes for the Senses. The Create is per language, and the Update is only one Senses field.

susanodd commented 4 months ago

Gloss Annotations can be updated, but Lemmas cannot be updated. (There is another issue for that, it is an "in person" issue. That is because it's ambiguous. @uklomp needs to resolve what it should be. Or whether extra arguments are needed.)

rem0g commented 4 months ago

Ok, so Gloss Update API is not usable right now without API token on signbank production?

rem0g commented 4 months ago

Gloss Annotations can be updated, but Lemmas cannot be updated. (There is another issue for that, it is an "in person" issue. That is because it's ambiguous. @uklomp needs to resolve what it should be. Or whether extra arguments are needed.)

Ok, we will talk about this thursday. As long we can update on the website, it can be done via API.

susanodd commented 4 months ago

Not on signbank production. That is correct. We need to make sure it all works properly first.

rem0g commented 4 months ago

Not on signbank production. That is correct. We need to make sure it all works properly first.

Ok, can you give an estimate when we can start updating glosses via the API?

susanodd commented 4 months ago

If you try as an example modifying the lemma in Gloss Edit on signbank, you will see there are different possibilities. You can update an existing lemma, or choose a different lemma, or create a new lemma. This is because glosses share the lemmas.

susanodd commented 4 months ago

Senses were implemented so they can be shared. (Like Lemmas.) But because of some difficulties, this functionality is in deep sleep.

Jetske commented 4 months ago

Annotation id gloss fields can be added without problems, right? @susanodd

susanodd commented 4 months ago

Yes, that's correct. I'm not entirely sure about duplicates. That was originally implemented by @vanlummelhuizen. As far as I know, it won't allow the same combination. But it seems to allow to repeat with different caps. I know we changed it years ago to not allow empty fields. But there are still lots of glosses with English not filled in. (This causes problems in list view because it can't sort. There have been bugs in the past years because of this. I'm not sure what happens if you try to "fill in" English for existing glosses. We don't do the data, so a real user would need to report a bug. It won't let you create glosses with Engish missing anymore.)

susanodd commented 4 months ago

Not on signbank production. That is correct. We need to make sure it all works properly first.

Ok, can you give an estimate when we can start updating glosses via the API?

There is a pull request. @Woseseltops had some problem using senses and wants a different error if the user does not have permission. We've added stuff to the wiki to help. So at the very least Thursday, so he can approve it.

susanodd commented 4 months ago

There are two testing previews of the Gloss Update API:

  1. A Signbank template in Django.
  2. A HTML file without any Django.

The example HTML file has been added to the api_interface branch and can be downloaded and used locally. English has been used.

The Signbank template is multilingual.

Both examples show the AJAX call to the API interface.

(1) uses a CSRF token and the user must be logged in. (2) does not require a CSRF token, it requires a Signbank API token, generated in your user profile.

rem0g commented 4 months ago

In JSON payload, senses has to be written with Senses as capital at the start.

Just, FYI. I lost 1 hour over this.

susanodd commented 4 months ago

In JSON payload, senses has to be written with Senses as capital at the start.

Just, FYI. I lost 1 hour over this.

Did it not give any appropriate error messages about this? Was this for updating, or for creating senses? @Jetske please take a look at this. This is for the Field (Key), right? It looks like @Woseseltops moved the examples to a separate new wiki page with API examples.

susanodd commented 4 months ago

I can make some more examples for the wiki. FYI, the Boolean field to make a gloss public uses True as the value: "In The Web Dictionary": "True"

Here is the NL example file for updating a gloss:

https://github.com/Signbank/Global-signbank/blob/master/signbank/dictionary/templates/dictionary/virtual_machine_gloss_update_api_token_nl.html

It's not a Django file, it's just html. You need to replace the token with your token. It's set up for signbank-dev. The gloss ID and dataset ID are constants in the javascript in this example. I used this for debugging and checking that the translations of field values works.

There are console logs so you can see what it sends to the server in JSON. Then double check the json dictionary keys. This one is in Dutch and the Field Choice Values need to be in Dutch. The field names need to be in Dutch when you want to send the values in Dutch.

The "fields" it shows in the browser are the json dictionary keys to be used.

Likewise, for EN

https://github.com/Signbank/Global-signbank/blob/master/signbank/dictionary/templates/dictionary/virtual_machine_gloss_update_api_token_en.html

And the two for creating a gloss:

https://github.com/Signbank/Global-signbank/blob/master/signbank/dictionary/templates/dictionary/virtual_machine_api_test_token_en.html

https://github.com/Signbank/Global-signbank/blob/master/signbank/dictionary/templates/dictionary/virtual_machine_api_test_token_nl.html

The javascript loops on internal IDs which are not used in the json data. The code is de-Djangified from other templates to remove all Django and Bootstrap from it. That's why the javascript is so. Just rely on the visible field names. You can double check the Dutch field values here for the values used in your system that they match something. It gives error messages specific to the fields that did not match.

I will check the json-keys to do reporting on those, too. It probably just ignores the ones that don't match. (The Senses field without a cap? Did it just ignore it?)

rem0g commented 4 months ago

I think it's better to create seperate issues, so i am going to close this one.