archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Feature File needed for PREMIS object/originalName value does not contain the original directory name #1376

Open jrwdunham opened 6 years ago

jrwdunham commented 6 years ago

Consider a transfer source with this structure:

apples/
├── ABC\ DEF
│   └── file.txt
├── UVW\ XYZ
│   └── file.txt
└── marché
    └── file.txt

If Archivematica is configured to assign UUIDs to directories, then (assuming the user-assigned transfer name is wed5 and the system-assigned UUID for the SIP is 3ed19510-b675-4e59-b114-ba84830841a0), then the PREMIS:originalName value assigned to the SIP as a whole will be as follows:

<premis:originalName>wed5-3ed19510-b675-4e59-b114-ba84830841a0</premis:originalName>

Contrast the above originalName to one assigned to one of the original directories in the transfer source:

<premis:originalName>%transferDirectory%objects/ABC DEF/</premis:originalName>

and to one assigned to one of the original files in the transfer source:

<premis:originalName>%transferDirectory%objects/ABC DEF/file.txt</premis:originalName>

Logically, it would seem that the above two originalName values should be as follows, respectively:

<premis:originalName>apples/</premis:originalName>
<premis:originalName>apples/ABC DEF/</premis:originalName>
<premis:originalName>apples/ABC DEF/file.txt</premis:originalName>

However, in order to implement the above, we would need to document the original name of the transfer at the beginning of processing (modifying the Transfer model), which we do not do currently.

jrwdunham commented 6 years ago

@sromkey @jhsimpson Do you have any thoughts on this?

jrwdunham commented 6 years ago

Related to (discovered in resolving) https://github.com/artefactual/archivematica/issues/1051.

sromkey commented 6 years ago

Archivally speaking, you have it right. The purpose of premis:originalName would be to document the original name of the directory or file, not the transfer name as entered by the user (that becomes the AIP name).

ross-spencer commented 6 years ago

@jrwdunham I am trying to observe the behaviour above to help create a script to retrospectively correct existing METS that looks like this. As such you might see some updates to this ticket as I go.

The first observation is that Assign UUIDs to directories won't impact some of this behaviour (I don't think), rather, the microservice 'Rename with Transfer UUID' which calls the script archivematicaMoveTransfer.py.

I added more logging to this script to show the values coming in and out:

image

The assign UUIDs to directories microservice from the same task chain:

image

ross-spencer commented 6 years ago

I have tried to recreate this again today, ingesting a single directory apples with a single text file as content, titled, pears.

pears persists into the AIP and can be located in the AIP's Objects directory.

apples does not appear anywhere in the SIP, including the various ingest and transfer logs created by Archivematica. As such, it seems that to find out what the name of the transfer directory was post-ingest, cannot be done.

Reviewing the Archivematica docs then it seems that the purpose of the transfer directory is that it becomes the de-facto Objects directory where one doesn't already exist. If this is the case, we should be more explicit in the docs. I will propose we write a Feature File that documents the purpose of the transfer folder as well. Hopefully this can form the basis for a discussion about ways forward here.

jhsimpson commented 6 years ago

:+1: to closing this issue by creating a feature file to test the behaviour.

ross-spencer commented 5 years ago

I have started some work on this, and will update this comment as it progresses:

Feature: Transfer Directories inside Archivematica

Scenario 1: A transfer is setup with a flat directory structure

Given a folder in an Archivematica transfer source location. 
And the folder contains digital files or folders that are not in the list... [].
When a user selects that folder.
And the user starts a transfer.
Then the files will be moved into an 'objects' directory.
And the SIP will be created containing this folder.

Scenario 2: A transfer is setup with items in an objects folder

Given a folder in an Archivematica transfer source location.
And the folder contains and 'objects' directory
And the 'objects' directory contains all the digital files or folders associated
    with the transfer.
When the user selects that folder.
And the user starts a transfer.
Then the SIP that is created will preserve the structure of the files and  
     folders in the 'objects' directory.

Scenario 3: A transfer is setup with items at the top level and in an 
            objects folder

Given a folder in an Archivematica transfer source location.
And the folder contains an 'objects' directory. 
And the 'objects' directory contains some of tthe files and directories
    associated with a transfer.
And the remainder of the digital files or folders are located in the directory
    above that one. 
Then all the digital files and folders will be moved into the objects directory.
And a SIP will be created containing this folder and all of the contents of the
    transfer. 

Scenario 4: A transfer consists of a processingMCP.xml file

Scenario 5: A transfer is setup with a metadata folder

Scenario 6: Any other styles of setup?