See problem issue: UC Data Migration to Dryad not reproducing some history correctly #649
The current version in Merritt of some Dash->Dryad objects includes content that was deleted in earlier ingests. The cause of this problem is that the delete.txt files were not applied to the Dryad objects that were built from the original Dash content.
This content is made visible to the Dryad users as returned content in container (e.g. zip) files for what is current. Most of this content was deleted from the current because:
wrong data was uploaded
some data was improperly named
some content may be viewed a proprietary and should not be returned
Goal
Merritt objects in the Dryad collection contain the same producer (customer) files as the current version as what is seen in Dryad
Minimize the impact on both Merritt and Dryad when fixing this problem
Steps
Create list of objects that have this problem
Create directory for each object containing original and corrected manifests with diff files
Validate current version of replace manifest with Dryad
Replace bad manifest.txt in cloud
Replace inv db entries using new manifest
Create list of objects that have this problem
For all Dash->Dryad objects determine which contained a system/delete.txt file in the original object
build.xml - constructed manifest.xml with correct current version
Diff logs
diffcurbld.log - diff current and build
diffoldbld.log - diff old and build
diffoldcur.log - diff old and current
Example diffcurbld.log
MATCH:current - build +++
<<<:/home/loy/MRTMaven/github/admin/mrt-store-admin/tasks/210420-current/prod/manifests/Q6H41PB7/current.xml
>>>:/home/loy/MRTMaven/github/admin/mrt-store-admin/tasks/210420-current/prod/manifests/Q6H41PB7/build.xml
***VERSION:1
<*>:1=| match count=421
***VERSION:2
<*>:2=| match count=421
***VERSION:3
<--:3=1|producer/YZhang_longitudinalALS.xlsx
<*>:3=| match count=421
***VERSION:4
<--:4=1|producer/YZhang_longitudinalALS.xlsx
<*>:4=| match count=422
Validate current version of replace manifest with Dryad
From a transaction list from Scott - create a list of current content to be validated against current content in the build.xml.
Because of a variety of different handling on the Dryad side, in several cases only the current version was available for confirmation.
program:
java and perl
result
All current cases matched the current for build.xml
example
After processing the transaction list, a simple list of current pathnames was extracted from these transactions and compared with build.xml current object. All producer entries matched.
The java routines are executed through Netbeans. Standalone routine could have been created but using Netbeans is more flexible and allow simpler debugging .
Netbeans needs to be run in the prod environment in order to have access to the specific cloud content being updated.
The DDCurrent routine runs based on a path and files available at that path
Example: /home/loy/MRTMaven/github/admin/mrt-store-admin/tasks/210420-current/prod
repository
mrt-store-admin/tools
local directories to path
config.yml - configuration for DDCurrent
dd-mod.txt - objects list
logs - logs from processing
manifests - top manifests directory for all object list entries - contains manifests
routines
DDConfig.java - handles yaml configuration
DDCurrent.java - does all list handling and manifest validation
FixDD.java - handles all cloud changes for adding manifest.save and replacing manifest
Normally this type of change can be run on the stage environment. Unfortunately the needed examples for testing did not exist in stage.
To provide a dev environment with production content, I did the following:
Create a storage docker that uses the production 2001 node for wasabi and also a 2003 node for a wasabi dev environment. Note that wasabi has replicated content for all of these Dryad objects
Copy 19 objects from prod (2001) to dev (2003)
Create the standard manifests directory for these objects with all manifests and diff files (see above)
Run FixDD on samples of this content
Run inv db delete and inv db add comparing results with diff files
Problem
See problem issue: UC Data Migration to Dryad not reproducing some history correctly #649
The current version in Merritt of some Dash->Dryad objects includes content that was deleted in earlier ingests. The cause of this problem is that the delete.txt files were not applied to the Dryad objects that were built from the original Dash content.
This content is made visible to the Dryad users as returned content in container (e.g. zip) files for what is current. Most of this content was deleted from the current because:
Goal
Steps
Create list of objects that have this problem
For all Dash->Dryad objects determine which contained a system/delete.txt file in the original object
Count: 217 entries
program:
perl
format:
local id | Dash ark | Dryad ark
example:
Create directory for each object containing original and corrected manifests with diff files
A directory is created for each entry in the Dash->Dryad list. The directory uses the local id (doi) for a name.
program:
DDCurrent. Fix.java.txt
java
manifest.xml files:
Diff logs
diffcurbld.log - diff current and build diffoldbld.log - diff old and build diffoldcur.log - diff old and current
Example diffcurbld.log
Validate current version of replace manifest with Dryad
From a transaction list from Scott - create a list of current content to be validated against current content in the build.xml.
Because of a variety of different handling on the Dryad side, in several cases only the current version was available for confirmation.
program:
java and perl
result
All current cases matched the current for build.xml
example
After processing the transaction list, a simple list of current pathnames was extracted from these transactions and compared with build.xml current object. All producer entries matched.
Replace bad manifest.txt in cloud
The replace is handled by the following steps in the cloud for that object:
program
DDCurrent.java using FixDD.java
Replace inv db entries using new manifest
Once the manifest in an object has been replaced in the cloud then the db entries for that object need to be replaced by:
program:
bash
example:
Java handling
The java routines are executed through Netbeans. Standalone routine could have been created but using Netbeans is more flexible and allow simpler debugging .
Netbeans needs to be run in the prod environment in order to have access to the specific cloud content being updated.
The DDCurrent routine runs based on a path and files available at that path Example: /home/loy/MRTMaven/github/admin/mrt-store-admin/tasks/210420-current/prod
repository
mrt-store-admin/tools
local directories to path
routines
DDConfig.java - handles yaml configuration DDCurrent.java - does all list handling and manifest validation FixDD.java - handles all cloud changes for adding manifest.save and replacing manifest