CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

Dash-Dryad Current version fix #682

Closed dloy closed 3 years ago

dloy commented 3 years ago

Problem

See problem issue: UC Data Migration to Dryad not reproducing some history correctly #649

The current version in Merritt of some Dash->Dryad objects includes content that was deleted in earlier ingests. The cause of this problem is that the delete.txt files were not applied to the Dryad objects that were built from the original Dash content.

This content is made visible to the Dryad users as returned content in container (e.g. zip) files for what is current. Most of this content was deleted from the current because:

Goal

Steps

Create list of objects that have this problem

For all Dash->Dryad objects determine which contained a system/delete.txt file in the original object

Count: 217 entries

program:

perl

format:

local id | Dash ark | Dryad ark

example:

doi:10.7272/Q6H41PB7|ark:/b7272/q6h41pb7|ark:/13030/m56741pj
doi:10.7272/Q67P8W9Z|ark:/b7272/q67p8w9z|ark:/13030/m5bc9ks1
doi:10.7272/Q6CC0XMH|ark:/b7272/q6cc0xmh|ark:/13030/m56741k6

Create directory for each object containing original and corrected manifests with diff files

A directory is created for each entry in the Dash->Dryad list. The directory uses the local id (doi) for a name.

program:

DDCurrent. Fix.java.txt

java

manifest.xml files:

Diff logs

diffcurbld.log - diff current and build diffoldbld.log - diff old and build diffoldcur.log - diff old and current

Example diffcurbld.log

MATCH:current - build +++
<<<:/home/loy/MRTMaven/github/admin/mrt-store-admin/tasks/210420-current/prod/manifests/Q6H41PB7/current.xml
>>>:/home/loy/MRTMaven/github/admin/mrt-store-admin/tasks/210420-current/prod/manifests/Q6H41PB7/build.xml

***VERSION:1

<*>:1=| match count=421

***VERSION:2

<*>:2=| match count=421

***VERSION:3

<--:3=1|producer/YZhang_longitudinalALS.xlsx
<*>:3=| match count=421

***VERSION:4

<--:4=1|producer/YZhang_longitudinalALS.xlsx
<*>:4=| match count=422

Validate current version of replace manifest with Dryad

From a transaction list from Scott - create a list of current content to be validated against current content in the build.xml.

Because of a variety of different handling on the Dryad side, in several cases only the current version was available for confirmation.

program:

java and perl

result

All current cases matched the current for build.xml

example

After processing the transaction list, a simple list of current pathnames was extracted from these transactions and compared with build.xml current object. All producer entries matched.

007=Subject_info_7272Q67P8W9Z.xlsx
007=bxFAD008-2-B0.hdr
007=bxFAD008-2-B0.img
007=bxFAD008-2-FA.hdr
007=bxFAD008-2-FA.img

Replace bad manifest.txt in cloud

The replace is handled by the following steps in the cloud for that object:

program

DDCurrent.java using FixDD.java

Replace inv db entries using new manifest

Once the manifest in an object has been replaced in the cloud then the db entries for that object need to be replaced by:

program:

bash

example:

cat invdel.sh
curl -X DELETE "http://uc3-mrtsandbox2-stg.cdlib.org:36121/mrtinv/object/ark%3A%2F13030%2Fm50k7wsz?t=xml"

cat invadd.sh
curl -X POST -F "url=http://uc3-mrtsandbox2-stg.cdlib.org:35121/storage/manifest/2003/ark%3A%2F13030%2Fm50k7wsz"  -F "responseForm=xml" http://uc3-mrtsandbox2-stg.cdlib.org:36121/mrtinv/add

Java handling

The java routines are executed through Netbeans. Standalone routine could have been created but using Netbeans is more flexible and allow simpler debugging .

Netbeans needs to be run in the prod environment in order to have access to the specific cloud content being updated.

The DDCurrent routine runs based on a path and files available at that path Example: /home/loy/MRTMaven/github/admin/mrt-store-admin/tasks/210420-current/prod

repository

mrt-store-admin/tools

local directories to path

routines

DDConfig.java - handles yaml configuration DDCurrent.java - does all list handling and manifest validation FixDD.java - handles all cloud changes for adding manifest.save and replacing manifest

dloy commented 3 years ago

Dev Testing environment

Normally this type of change can be run on the stage environment. Unfortunately the needed examples for testing did not exist in stage.

To provide a dev environment with production content, I did the following: