Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Delete Gloss via API #1210

Open rem0g opened 3 months ago

rem0g commented 3 months ago

We should be able to delete Gloss via API with using Gloss ID, with confirm callback.

signCollect ask Signbank to delete Gloss. Signbank replies: Gloss is found, are you sure? signCollect will confirm the deletion. Signbank callbacks with gloss deleted.

signCollect will keep seperate backup for that gloss in event of user error and then it can re-create the gloss.

susanodd commented 3 months ago

This is extremely dangerous! It's an object-oriented database. There is no "undo". A delete also deletes all relations the gloss is part of as well as the videos. This could affect glosses in the corpus that other users have made. There are no gloss-specific permissions.

rem0g commented 3 months ago

We are able to delete glosses via Signbank Website, why wouldnt that be possible via API?

susanodd commented 3 months ago

We are able to delete glosses via Signbank Website, why wouldnt that be possible via API?

Because there are no humans involved and you want to automate it.

What's to stop your software from deleting everything? Or say, hold Signbank hostage?

rem0g commented 3 months ago

Please limit your responses to software developement, if you want to discuss about this then i would suggest to mail or make an appointment with me and Ulrika. This feature is needed unfortunately as our colleagues can make mistakes sometimes.

Woseseltops commented 3 months ago

Hey @rem0g, I (and Jetske too) do agree this feature would be a valuable addition to the API. I think Susan intended to point out that with automated systems, the risk of accidental deletion of large amounts of glosses could be higher. This is something we can fix at our side:

  1. Have an extra gloss field archivedAtDate. If this field is set, the gloss won't show up in any lists, searches, API output, etc
  2. Have a cronjob that after every x time units, deletes all glosses with an archivedAtDate that is older than y days ago.
rem0g commented 3 months ago

Ok perfect, that is something i can work with.

susanodd commented 3 months ago

See #1211

susanodd commented 3 months ago

Do you need different "corpus gloss ID"s for your glosses?

The CNGT glosses should not be deleted because all the frequency data from the EAF files depends on them. The Signbank ID is that used in the CNGT also.

Do glosses need an additional UvA ID field?

The Gloss ID (Signbank ID) is also linked to a Corpus via a table. Also to Document, Speaker, Frequency objects. (These are visible in the Gloss Frequency panel of the Gloss View.) Because these IDs are used by researchers, they should not be deleted.

Do you require "different" IDs for the glosses in your corpus? A gloss may be associated with more than one corpus. But the ID cannot be changed.

So probably we can add an additional UvA ID field. Or another table to map glosses that appear in mutliple corpora to Gloss IDs in the various corpora. Then they could be looked up via either. Then the Affiliation could be added to the API routines.

The CNGT glosses should not be deleted. The frequency routines run as a cron job that updates based on updates to the EAF files for CNGT.

susanodd commented 3 months ago

The minimal pairs are also computed dynamically. (Not stored anywhere.)

I don't know enough about Phonology itself to know whether you would be using completely different field choices in your software. If so, that would cause problems for researchers making use of the minimal pairs if you start updating the phonology of CNGT glosses. Or delete glosses that are in the minimal pairs of a CNGT gloss.

susanodd commented 3 months ago

Please check out #127.

Deleting a gloss also deletes the relations and morphology. Changing the annotation to add a suffix would be preferred.

At the moment, none of the "relations, morphology" fields of a gloss are returned in the read API functions. Probably an additional API is needed to get these. This would be the reason for needing to check with signbank prior to delete.

This data is not a yes/no/are you sure? You can see these in the CSV columns. It's not necessarily useful in the API because it needs to be human readable, also in the CSV. It's like returning spaghetti.

I can make an additional API function "get related objects" (get relations?) that returns the same as is done in #127. Then you can just check whether it's empty. (Our software can't be intelligent enough to do questions about "are you sure". It could return the "other data that will be deleted in the case of delete". That should be understood as "not wise to do this".)

We'll see what @Woseseltops comes up with for setting an "archived" field. That would need to be cascaded to other relations as well. Because it's an OO database, all the queries end up linked to each other to retrieve things out of OO tables related to the gloss. I'm not sure to what extent an "archived" field would cascade to other functionality. (This would impact also analysis functionality because the queries would all need to be modified.)

We have functionality implemented to retrieve variants (based on the patterns -A, -B, ....) Perhaps some assist function could be made to show the currently used ones so the user can change the annotation suffix.

Woseseltops commented 3 months ago

Let's do this in phases:

susanodd commented 3 months ago

Okay, nice, that's wise to do it that way. Then we can make sure everything still works with the new archived field!

I was dreaming about this last night.

There will likely be problems with the "Django" functions that make use of the "related_name" in the models to go backwards. E.g., code that looks like this: gloss.lemma.lemmaidglosstranslation_set.all()

There are zillions of things like this in the code because we strived to make it maintainable. All the code that has this pattern will need to be prefaced where needed by the control flow or context with the check whether the gloss is archived. At the moment, I'm not sure where to include the check.

OPINION The best person to implement this is @vanlummelhuizen. This ought to be thought out a priori before we modify all the code. (This is reminiscent of what happened with the Field Choices.)

rem0g commented 2 months ago

Bump. Any updates?

susanodd commented 2 months ago

This has been on the back burner because we had to get the API token working before other things.

rem0g commented 23 hours ago

And now? We should solve this issue asap, really frustrating as i see other tasks are being executed but this doesnt.

susanodd commented 22 hours ago

Okay, I can start on this. I finished the Batch Edit that I've been working on since June 20. I was going to work on the CSV issue with Excel. But I'll work on this instead.

susanodd commented 19 hours ago

I've got as far as I can tell all the queries over Gloss to exclude those with archived set to True. (HAVE BEEN DONE NOW)

I added a new post_save method that when the archived is set to True (which replaces the previous "delete()" so the archived gloss is saved to the DeletedGlossOrMedia, as before and can be seen in the Admin.

Now after the archived has been set to True, as part of this post_save method, the following needs to be done:

def cascade_archival_gloss(gloss):

    relations_source = Relation.objects.filter(source=gloss)
    relations_target = Relation.objects.filter(target=gloss)
    morphemes_parent = MorphologyDefinition.objects.filter(parent_gloss=gloss)
    morphemes_appears_in = MorphologyDefinition.objects.filter(morpheme=gloss)
    simultaneous = SimultaneousMorphologyDefinition.objects.filter(parent_gloss=gloss)
    blends = BlendMorphology.objects.filter(parent_gloss=gloss)
    partofblends = BlendMorphology.objects.filter(glosses=gloss)

    # delete all of the relation objects above when the gloss is archived

    # ALSO grab all Senses objects related to this gloss!!
    # this is to implement the Django "cascade" operation

This will effectively detach/remove the archived gloss from all the tables/other models that reference it.

Alternatively, the "retrieval" / "querying" done on the above would need to exclude any archived glosses.

Preferences? Do we need to be able to "undo" a previous "delete" ("archival") ?

Of course, it could be that the "deleted" gloss isn't embedded into the rest and was just a mistake, so it's been deleted.

susanodd commented 18 hours ago

@rem0g quick question to assist me in determining how to resolve.

Above, there is a method that obtains all the "references" to the to-be-deleted aka to-be-archived gloss.

What are the chances that you will be deleting glosses "that are attached to lots of other glosses" ?

Or will you be primarily deleting glosses that were created by mistake?

susanodd commented 18 hours ago

The alternative to deleting would be to find all the reverse relations on the "related" object that refer to the gloss and remove them from the queries.

That is probably better. There are actually many more classes that have a ForeignKey to Gloss.