UAL-RE / LD-Cool-P

Python tool to enable data curation
MIT License
4 stars 5 forks source link

Bug: move_next, move_publish break on updated dataset but no new version is generated #245

Open zoidy opened 2 years ago

zoidy commented 2 years ago

Describe the bug When a dataset is updated, not all changes trigger the generation of a new version of the Dataset in ReDATA (see the Figshare documentation for what does and doesn't trigger new versions).

When a dataset is updated and a new version is not triggered, putting the dataset into the curation workflow generates a v02 folder however trying to work with LD-Cool-P commands move_next, move_publish etc. still treat it as v1, causing conflicts with existing v1 folders for that deposit.

Solution approach is currently unclear. Perhaps we may need to modify the folder naming convention for versions. E.g., using major and minor versions: v01, v01.1, v02, ...

Note: there is a request_number field in the curation metadata json that can be examined. The request_number always increments by one every time the file is submitted for curation, no matter if a new version is generated or not.

Version information

Additional note on versioning When creating the folder structure when the dataset is pulled down using get_data, LD_Cool-P generates the version by simply adding 1 to the existing version, not taking into consideration that not all curation reviews result in a new version as stated above. This behavior would need to be modified at the same time this bug is addressed. See this line of code. For the time being, this behavior of simply adding 1 is beneficial since it allows the data to be pulled down, even though the other commands don't work.

yhan818 commented 2 years ago

See https://help.figshare.com/article/can-i-edit-or-delete-my-research-after-it-has-been-made-public#:~:text=Figshare%20supports%20versioning%20for%20both,different%20between%20items%20and%20collections "What is a new version?" in Figshare doc.

alias move_next="$ldcoolp_root/ldcoolp/scripts/perform_move --config $ldcoolp_config --direction next --article_id " alias move_back="$ldcoolp_root/ldcoolp/scripts/perform_move --config $ldcoolp_config --direction back --article_id " alias move_publish="$ldcoolp_root/ldcoolp/scripts/perform_move --config $ldcoolp_config --direction publish --article_id "

So perform_move is the script to debug. It has to do with current folder. When there is no new version generated, it must try to overwrite the current version (basically a replace) (either success or fail ).

Giving some of these actions (e.g. except changing "title", "author", updating/deleting files) will not generate a new version. To getting the JSON response from Figshare (https://docs.figsh.com/#account_institution_curation). There are two fields updated: "request_number" and "modified_date".

zoidy commented 1 year ago

Update: the current practice has been to not use LD-Cool-P at all when a dataset will not generate a new version. Instead, we add the updated information to the same folder of the existing version, and manually copy a DepositReview template, renaming it with a .1 appended. This works because updates that do not generate a new version never involve changes to the Title, Authors, or Files. Unless the change is large enough (e.g. a significant update to the description), it's not worth generating a new readme (which would sidestep this issue).

Note: there is no way to tell whether an item in curation will generate a new version or not without a manual inspection of the changes.

For example: assume a dataset is at v1. A user corrects a small spelling error in the description and submits the dataset for review.

  1. The item is received as an update to the dataset in the curation dashboard
  2. The curator must inspect it carefully to see what the changes are and whether those changes will trigger a new version
  3. If the changes trigger a new version and the changes are small, proceed to the next step. If the changes are significant, do not continue with this process. Instead, edit the existing readme file and upload it to the dataset. This will trigger a new version and one can proceed with LD Cool P as normal.
  4. Go to the v01 folder in the curation server. Manually create a copy of the existing v01 review report and increase the version in the file name to v01.1.
  5. Record the curation process in the report as usual.