BetaMasaheft / Documentation

Die Schriftkultur des christlichen Äthiopiens: Eine multimediale Forschungsumgebung
3 stars 3 forks source link

dealing with deleted records #1401

Closed thea-m closed 4 years ago

thea-m commented 4 years ago

Dear all,

I believe that our current modus operandi for deleting records would benefit from some modifications which might serve the project's long-term sustainability and trustworthiness. I am writing now about work records, where I think this is the most urgent, but it is potentially true for all entities. I see two conflicting needs: one is to have flexibility in creating, changing and deleting records, which is absolutely crucial as we are constantly working with new material and having new insights. The other one is our aim to provide identifiers to all scholars to refer univocally to specific texts in their publications. In order for the Clavis to become widely accepted and used, there must be absolute trust in the Clavis numbers. From my limited perspective and understanding, I would say this: We must be able to delete records even if their Clavis ID has been used in publications when this is necessary (because they are doublets is the most common scenario, but there are more possibilities). At the same time, Clavis numbers of deleted records must be findable easily for scholars not familiar with the project and the details of our workflow. They should ideally be able to find the deleted Clavis number, the reason for its deletion and be referred to the updated Clavis number to use instead, or any other necessary information. Right now, there are already useful error messages messages on the app when opening a deleted record's page (Sorry! LIT4918MalkeaMaryam has been marked as deleted.) or referring to a deleted ID in another record. Ideally, I would imagine a scenario where in the "find by Clavis number" function (which will exist in the future), it would be possible to type in deleted Clavis numbers and find all the information specified above. This would necessitate for us to give this information in a consistent way when deleting records and this information to be retrieved somehow. As you know, I am naive about the technical implications, but always (too?) optimistic about their possibilites.

Thank you for reading until here to those who are still there :)

I really feel that this is an important issue. What do you think?

@PietroLiuzzo @DenisNosnitsin1970 @eu-genia @DariaElagina

PietroLiuzzo commented 4 years ago

My first question is why would you feel the need to delete an entity which has been cited somewhere? Secondly, if you delete a record with good reasons, where should one be pointed? depending on the reason for the deletion it may be just no where or to one of many other records. I have really not understood, how do you envisage this happening in a better way than simply pointing to a note of removal of the record? the consistent information when deleting should in any case be given in the commit where this is done. a deleted record at a given point in its history should also be retrievable in the current setup.

thea-m commented 4 years ago

Thank you! I don't feel the need to delete an entity which has been cited :) I do, however, feel the need to delete entities that are doublets or otherwise redundant. This does not happen often, but happens. Right now I am quite sure that the Clavis IDs have been used very rarely, but I really hope that will change in the next decades. I am therefore certain that we haven't deleted entities that have been cited. But I foresee that we will not know what has been cited by someone somewhere and that very rarely something might be deleted that has. I simply noticed that the last records that were deleted were always doublets of other records. In these cases, a simple equivalence can be stated. This should be done by us in the commit message. It is not done consistently currently, and that is something simple and significant that we can and should easily change. I know that deleted records are retrievable. But they are so by us, certainly not by someone who has never used GitHub or someone who uses the web app and has no idea what GitHub is (and they should not need to). My issue is that we give currently repeated reassurances to others that the Clavis IDs are stable and that "nothing will be deleted" without reason and redirection. That is all true, but I fear that the deleted records might come to haunt us (maybe not now, but in a few years). About the possibilites of how to do this, I lack technical background understanding. If, say, we standardise all deletion commit messages to contain the word "deleted" and a clear reason, would there be a way of visualizing these on the app? Would there be a way of having deleted records exist as stubs in the database, with the same message? (keeping them in the normal GitHub repositories seems to me very dangerous)

eu-genia commented 4 years ago

While it is clear that we replace the references to the deleted IDs with those to the active ones (and will have to do that in the future) I agree that (1) there is a danger that if many people are working on the app and we cannot control each and every commit - especially if more persons are working at the same time with the same topic - situations potentially may emerge that an ID is deleted by someone and still remains cited by someone else.

(2) there is a more probable danger that someone quotes a ClavisID in a printed matter which cannot be synchronized with the app and which then becomes obsolete. I think this is the problem that bothers @thea-m most.

thea-m commented 4 years ago

(note LIT4112Salam, which has been deleted two years ago but still was used this year in records, and has just cost me a day...)

PietroLiuzzo commented 4 years ago

Let me try to resummarise and keep practical in sorting this simple, solvable and useful issue.

1. Doublets

User story

A users finds out two records refer to the same Textual Unit and she is 150% sure she is not actually overestimating, enquires with the team, gets confirmation, merges the information in both records into one lucky one, updates references to the surviving record and deletes the other. This should avoid the LIT4112Salam case above, but it has not been the case apparently.

App behaviour

The app will record in a deleted list the record. when the deleted record is requested a redirect to an error page and a message permanently removed should be returned.

Relation to print references

An already printed version of a work quoting the deleted record will become wrong. eventually following a link or trying to build one will return only the above.

Possible Improvement

We could add a simple relation to the surviving file, perhaps using a simple sameAs instead of a bespoke thing. This would allow to perform the merging without deleting the record. If the other record is removed, the relation could be fetched also upon requests for the old record LIT1 is deleted because a doublet of LIT2, LIT gets a relation LIT2 sameAs LIT1.

2. A part of something else, not a record

User story

A user finds out a Textual Unit is not actually a textual unit at all, but rather a subpart of an existing unit. She decides to remove it.

App behaviour

The app will behave as detailed in the same section of the example above.

Relation to print references

Same as above

Possible Improvement

In this case I think the deletion action is superfluous. There will be a relation saying that this is part of something else, and there is an equation between this not-real unit and its real essence as part of another unit. On the other side, one may make sure that no direct references are present, which may document the actual relevance of that unit as such. But if the did is done, we could redirect to the subpart, using a relation as above in the appropriate place which will be moreLIT1 sameAs LIT2#subpart.

3. Removed for not relevant or simply wrong

User story

A user finds out a Textual Unit is not actually a textual unit at all, and because it is unbearable and there was no effort put in it anyway, and she has checked with the team that this is the case and the record can be removed, she goes on to kill it permanently.

App behaviour

The app will behave as detailed in the same section of the example above.

Relation to print references

Same as above

Possible Improvement

In this case we do not have a record to redirect to, all we can say is that this does not exist anymore and there will be no place where the reason for this deletion is preserved, unless well documented in GitHub. There is the list of deleted entries, but this is an application file, not a resource to be edited. We could make it accessible to record specifications by the users however. When a deletion action is requested, the user will get a link in an email from the app requesting to provide a specification, the link will point to a page accessible only to cataloguers where this information is entered. It will say something "provide a reason for the removal of LIT1" and a field to enter text. This will be saved to the app deleted files list and will be fetched when LIT1 is requested, so that a user or somebody looking up a reference will be told something like "LIT4112Salam is not an available item. It was deleted for this reason |text provided|."

A page of deleted records

This is possible, and could also be linked from the error page, in addition or as alternative to the proposals above

eu-genia commented 4 years ago

Thank you @PietroLiuzzo , just something maybe I don't quite get

I think it is also reasonable to say that the relations or similar actions are only introduced from now, we cannot go back systematically (unless we anyway are updating a relevant record), what was done was done; we should, if a new guideline is decided upon, apply it from now on and in the future

PietroLiuzzo commented 4 years ago

We can use sameAs and change the behavior not to have a link, based on the presence of that id in the list of deleted entries. so, if I have LIT1 sameAs LIT2, but LIT1 is in the list of deleted, the app will show LIT2 and will deal with the visualization of LIT1 accordingly. One thing is the data, another the visualization, and the second should not be a factor to decide the first. Here we have all we need to let the app make a decision on what to print and where. If you prefer another relation, for example to avoid confusion with other usages of sameAs, I am ok with that.

Adding DELETED to the title of a "stub" that is not actually deleted, because it is there, is not a good idea and would generate a lot of unwanted caos for machines as well as humans, beside not solving the issue here in any respect IMHO, and not being easily parsable for any other purpose.

eu-genia commented 4 years ago

I was not suggesting to add DELETED but rather to be consistent and go on and delete the superfluous record

thea-m commented 4 years ago

Thank you, excellent suggestions! And I completely agree that we should not worry about the records that have already been deleted, if these or similar mechanisms are implemented from now on, it will be absolutely sufficient. I also understand @eu-genia 's worry about keeping "deleted" records alive. If it is decided that the best solution is not to delete these records, but to keep them (and they will be visible in everyone's GitHub folders?) and add a relation in the other record, it should be absolutely visible in the XML records that they are deleted: everyone navigates and searches and finds their IDs in different ways (as LIT4112Salam has proven: I think it was added to a (EMIP) record before being deleted, and only published after the deletion and therefore escaped the substition action that I hope preceded the deletion. This record must have then been used extensively for copy pasting). So, we should keep in mind those who retrieve their IDs directly in the files (hoping that they occasionally open the records they are using...), not via the app. The same is true if it is decided that records, which are in fact part of larger units, should not be deleted. As long as there is a good solution for this (I'm confident that will be possible), I am very happy with everything suggested by @PietroLiuzzo :)

PietroLiuzzo commented 4 years ago

I will do the following, unless objections come up

do we want to use sameAs or formerlyAlsoListedAs or something else?

thea-m commented 4 years ago

Thank you! I think this will be a significant improvement. Maybe formerlyAlsoListedAs (or a similar, but already existing relation from an existing ontology) would be clearer than sameAs, which is also used in other cases.

PietroLiuzzo commented 4 years ago

A summary of the visible effects of the above changes which will become available with the release and which close this issue.

  1. the find clavis number will only search the existing clavis number, not the deleted ones, to avoid confusion of functions. this changes affect any record being deleted from the release onwards.
  2. a request for a deleted record will first try to redirect to an existing record which points to the deleted one.
  3. a request for a deleted record which does not find any way to redirect will show a message pointing to a list of all deleted records.
  4. the page of the item which has a reference to a delete one will say so in the heading and in the relation view.
  5. the list of deleted items will tell which item has been removed and when and eventually provide a reference to a record where a reference is found to the deleted one. A link for all available permalinks will also be available.
  6. places needing to print a title for a deleted id will look for correspondencies and print specific values.
  7. old versions of deleted records are retrievable and listable in the deleted records list, which means that a reference in a publication to a record which has now been deleted will follow the options in point 1, and if a permalink was given, this will not only show the actual referenced file, but also an additional note that this file has now been deleted (and when this happened and if there is a replacement), so that the reference will not chase to be valid.
PietroLiuzzo commented 4 years ago

(btw: relation elements can be added retroactively...)