Open rem0g opened 3 months ago
This is extremely dangerous! It's an object-oriented database. There is no "undo". A delete also deletes all relations the gloss is part of as well as the videos. This could affect glosses in the corpus that other users have made. There are no gloss-specific permissions.
We are able to delete glosses via Signbank Website, why wouldnt that be possible via API?
We are able to delete glosses via Signbank Website, why wouldnt that be possible via API?
Because there are no humans involved and you want to automate it.
What's to stop your software from deleting everything? Or say, hold Signbank hostage?
Please limit your responses to software developement, if you want to discuss about this then i would suggest to mail or make an appointment with me and Ulrika. This feature is needed unfortunately as our colleagues can make mistakes sometimes.
Hey @rem0g, I (and Jetske too) do agree this feature would be a valuable addition to the API. I think Susan intended to point out that with automated systems, the risk of accidental deletion of large amounts of glosses could be higher. This is something we can fix at our side:
archivedAtDate
. If this field is set, the gloss won't show up in any lists, searches, API output, etcarchivedAtDate
that is older than y days ago.Ok perfect, that is something i can work with.
See #1211
Do you need different "corpus gloss ID"s for your glosses?
The CNGT glosses should not be deleted because all the frequency data from the EAF files depends on them. The Signbank ID is that used in the CNGT also.
Do glosses need an additional UvA ID field?
The Gloss ID (Signbank ID) is also linked to a Corpus via a table. Also to Document, Speaker, Frequency objects. (These are visible in the Gloss Frequency panel of the Gloss View.) Because these IDs are used by researchers, they should not be deleted.
Do you require "different" IDs for the glosses in your corpus? A gloss may be associated with more than one corpus. But the ID cannot be changed.
So probably we can add an additional UvA ID field. Or another table to map glosses that appear in mutliple corpora to Gloss IDs in the various corpora. Then they could be looked up via either. Then the Affiliation could be added to the API routines.
The CNGT glosses should not be deleted. The frequency routines run as a cron job that updates based on updates to the EAF files for CNGT.
The minimal pairs are also computed dynamically. (Not stored anywhere.)
I don't know enough about Phonology itself to know whether you would be using completely different field choices in your software. If so, that would cause problems for researchers making use of the minimal pairs if you start updating the phonology of CNGT glosses. Or delete glosses that are in the minimal pairs of a CNGT gloss.
Please check out #127.
Deleting a gloss also deletes the relations and morphology. Changing the annotation to add a suffix would be preferred.
At the moment, none of the "relations, morphology" fields of a gloss are returned in the read API functions. Probably an additional API is needed to get these. This would be the reason for needing to check with signbank prior to delete.
This data is not a yes/no/are you sure? You can see these in the CSV columns. It's not necessarily useful in the API because it needs to be human readable, also in the CSV. It's like returning spaghetti.
I can make an additional API function "get related objects" (get relations?) that returns the same as is done in #127. Then you can just check whether it's empty. (Our software can't be intelligent enough to do questions about "are you sure". It could return the "other data that will be deleted in the case of delete". That should be understood as "not wise to do this".)
We'll see what @Woseseltops comes up with for setting an "archived" field. That would need to be cascaded to other relations as well. Because it's an OO database, all the queries end up linked to each other to retrieve things out of OO tables related to the gloss. I'm not sure to what extent an "archived" field would cascade to other functionality. (This would impact also analysis functionality because the queries would all need to be modified.)
We have functionality implemented to retrieve variants (based on the patterns -A, -B, ....) Perhaps some assist function could be made to show the currently used ones so the user can change the annotation suffix.
Let's do this in phases:
delete gloss
functionality does not actually delete things anymore, but just switches the boolean on a new archived
field. Good point that this isn't just the case for glosses, but also for the other things (relations) that are deleted in the delegate gloss
functionality.archived
field are requested, you filter out the objects with archived=False
.Okay, nice, that's wise to do it that way. Then we can make sure everything still works with the new archived field!
I was dreaming about this last night.
There will likely be problems with the "Django" functions that make use of the "related_name
" in the models to go backwards.
E.g., code that looks like this:
gloss.lemma.lemmaidglosstranslation_set.all()
There are zillions of things like this in the code because we strived to make it maintainable. All the code that has this pattern will need to be prefaced where needed by the control flow or context with the check whether the gloss is archived. At the moment, I'm not sure where to include the check.
OPINION The best person to implement this is @vanlummelhuizen. This ought to be thought out a priori before we modify all the code. (This is reminiscent of what happened with the Field Choices.)
Bump. Any updates?
This has been on the back burner because we had to get the API token working before other things.
And now? We should solve this issue asap, really frustrating as i see other tasks are being executed but this doesnt.
Okay, I can start on this. I finished the Batch Edit that I've been working on since June 20. I was going to work on the CSV issue with Excel. But I'll work on this instead.
I've got as far as I can tell all the queries over Gloss to exclude those with archived set to True. (HAVE BEEN DONE NOW)
I added a new post_save method that when the archived is set to True (which replaces the previous "delete()" so the archived gloss is saved to the DeletedGlossOrMedia, as before and can be seen in the Admin.
Now after the archived has been set to True, as part of this post_save method, the following needs to be done:
def cascade_archival_gloss(gloss):
relations_source = Relation.objects.filter(source=gloss)
relations_target = Relation.objects.filter(target=gloss)
morphemes_parent = MorphologyDefinition.objects.filter(parent_gloss=gloss)
morphemes_appears_in = MorphologyDefinition.objects.filter(morpheme=gloss)
simultaneous = SimultaneousMorphologyDefinition.objects.filter(parent_gloss=gloss)
blends = BlendMorphology.objects.filter(parent_gloss=gloss)
partofblends = BlendMorphology.objects.filter(glosses=gloss)
# delete all of the relation objects above when the gloss is archived
# ALSO grab all Senses objects related to this gloss!!
# this is to implement the Django "cascade" operation
This will effectively detach/remove the archived gloss from all the tables/other models that reference it.
Alternatively, the "retrieval" / "querying" done on the above would need to exclude any archived glosses.
Preferences? Do we need to be able to "undo" a previous "delete" ("archival") ?
Of course, it could be that the "deleted" gloss isn't embedded into the rest and was just a mistake, so it's been deleted.
@rem0g quick question to assist me in determining how to resolve.
Above, there is a method that obtains all the "references" to the to-be-deleted aka to-be-archived gloss.
What are the chances that you will be deleting glosses "that are attached to lots of other glosses" ?
Or will you be primarily deleting glosses that were created by mistake?
The alternative to deleting would be to find all the reverse relations on the "related" object that refer to the gloss and remove them from the queries.
That is probably better. There are actually many more classes that have a ForeignKey to Gloss.
We should be able to delete Gloss via API with using Gloss ID, with confirm callback.
signCollect ask Signbank to delete Gloss. Signbank replies: Gloss is found, are you sure? signCollect will confirm the deletion. Signbank callbacks with gloss deleted.
signCollect will keep seperate backup for that gloss in event of user error and then it can re-create the gloss.