NCATSTranslator / Knowledge_Graph_Exchange_Registry

The Biomedical Data Translator Consortium site for development of Knowledge Graph Exchange Standards and Registry
MIT License
5 stars 3 forks source link

Users uploading data should have the ability to delete or modify their file sets. #22

Open RichardBruskiewich opened 3 years ago

RichardBruskiewich commented 3 years ago

What happens if the same group uploads two different versions of their graph on the same day? Various use cases:

  1. The second version is "new" or "corrected" and should overwrite the first version: users should be allowed to look up 'their data" and selectively delete older (deprecated) versions from the repository?
  2. The second version is to be kept independent of the first (default timestamp version won't be granular enough, but user specified versioning - KGE Issue 21 could solve this?)
RichardBruskiewich commented 3 years ago

Maybe undesirable to allow users to tweak existing KGE File Sets. Rather, insist on every changed version being a new SemVer versioned release.

It would be helpful to allow users more flexible file set creation options on the upload.htmlweb form: e.g. to select an existing files set to copy of files of an existing KGE File Set into a new File Set upload form context, then delete / overwrite / upload other new files, then click "Done Uploading" to save it as the new version.

Does raise question about who should be allowed to copy/duplicate data of an existing file set.

jeffhhk commented 3 years ago

From the perspective of primarily being a KG "consumer" (as opposed to a KG "producer"), I tend to view ability for publishers to modify their data sets whole cloth as more of a bug than a feature. Immutable data is far easier to consume than mutable data, and most of KGE is IMO better poised to offer a predominantly immutable model than a predominantly mutable model.

One bright line that is crossed is the "Done Uploading" button (implemented using the /archive/publish endpoint). From that point on, the main way that the user has to correct errors is to publish a new version. I don't think there is much wrong with saying it's the only way.

I can anticipated two cases in which uploading a new version being the only way to change could be a problem.

1) Uploading a version with a known debilitating bug. For this, I would suggest having an operation for a publisher to mark a file set with flags such as "retracted", "deprecated" or "buggy", so that consumers can find out that it's no longer recommended for use. But it seems like it should be on consumers to decide how to respond to this communication, as opposed to preventing them from downloading the data. This seems worth implementing, but not before there are at least a few users.

2) After uploading something according to an incorrect license agreement. That is, "oops sorry didn't have permission to publish this", please retract. This is important but rare. I think it is probably rare enough that handling by an email and manual s3 rm is fine.