archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: in some instances, the nameCleanup log is empty when there has been a filename change #1146

Open sallain opened 4 years ago

sallain commented 4 years ago

Expected behaviour Filenames containing non-ASCII characters are changed during Job: Change object and directory filenames. This change results in:

  1. A stream output in the UI:

image

  1. A logfile (located in the AIP at my-aip/data/logs/transfers/my-aip-uuid/logs/filenameCleanup.log)
Sanitized name: /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/ampersand&.txt  ->  /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/ampersand_.txt
Sanitized name: /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/tilde~tilde.txt  ->  /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/tilde_tilde.txt
Sanitized name: /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/,  ->  /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/__1
Sanitized name: /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/,/@at.txt  ->  /var/archivematica/sharedDirectory/currentlyProcessing/5.1-test-e5ec283e-652a-4021-8a18-fdf0a8ace1ae/objects/__1/_at.txt
  1. A PREMIS event for each name change:
            <premis:eventType>name cleanup</premis:eventType>
            <premis:eventDateTime>2020-03-11T21:32:48.317696+00:00</premis:eventDateTime>
            <premis:eventDetailInformation>
              <premis:eventDetail>prohibited characters removed: program="sanitize_names"; version="1.10.db4f6aca278e6daf2af160ed40349baf7c6f53af"</premis:eventDetail>
            </premis:eventDetailInformation>
            <premis:eventOutcomeInformation>
              <premis:eventOutcome></premis:eventOutcome>
              <premis:eventOutcomeDetail>
                <premis:eventOutcomeDetailNote>Original name="%transferDirectory%objects/(brackets)/@at.txt"; cleaned up name="%transferDirectory%objects/(brackets)/_at.txt"</premis:eventOutcomeDetailNote>
              </premis:eventOutcomeDetail>
            </premis:eventOutcomeInformation>

Current behaviour For some reason, one of the archivematica-sampledata transfers results in an empty logfile.

The specific transfer in question is SampleTransfers/DemoTransferCSV. It has a file (artwork/Montréal.png) that undergoes a filename change. The task output is as expected and the PREMIS event appears in the METS, but the logfile is blank.

A transfer that works as expected is TestTransfers/badNames.

After some investigation, we can't figure out why one transfer works as expected and the other doesn't. We decided to file this issue for reference, in case this crops up with other examples.

We also agreed that the most important thing was #3, that the PREMIS name cleanup event shows up in the METS file.

Steps to reproduce Create a transfer using SampleTransfers/DemoTransferCSV and check the logfile (N.B. it's the logfile within the transfers/ subdirectory, my-aip/data/logs/transfers/my-aip-uuid/logs/filenameCleanup.log)

Your environment (version of Archivematica, operating system, other relevant details) qa/1.x (1.11) on CentOS, RPM, and Xenial.

The same transfer run on 1.10.1 has the logfile as expected.


For Artefactual use:

Before you close this issue, you must check off the following:

replaceafill commented 4 years ago

Probably relevant https://github.com/artefactual/archivematica/pull/1401#pullrequestreview-245608346

sallain commented 4 years ago

@replaceafill Oh hmm. It still seems odd that it would log the filename change in one instance but not another? Maybe @sevein has insight.

Regardless, I don't think @sromkey wants to investigate this too much for now.

sromkey commented 4 years ago

Yeah, this is intriguing but can be left for post-1.11 release :)

replaceafill commented 4 years ago

While investigating https://github.com/artefactual/archivematica/pull/1588 I came across this in the MCPServer logs which could help debugging this when needed:

archivematica-mcp-server_1       | INFO     ThreadPoolExecutor-0_0 2020-03-15 04:48:03 decisions:run:52:  Running Generate transfer structure report (package 4fbac528-53d4-44ec-bbd5-a311ce5064b7)
archivematica-mcp-server_1       | INFO     ThreadPoolExecutor-0_0 2020-03-15 04:48:03 decisions:decide:127:  Using user selected chain e9eaef1e-c2e0-4e3b-b942-bfb537162795 for link 56eebd45-5600-4768-a8c2-ec0114555a3d
archivematica-mcp-server_1       | DEBUG    ThreadPoolExecutor-0_0 2020-03-15 04:48:03 base:mark_complete:102:  NextChainDecisionJob 0c3862cc-a58f-44d2-9308-db49a1caa4f8 done with exit code 0
archivematica-mcp-server_1       | DEBUG    ThreadPoolExecutor-0_0 2020-03-15 04:48:03 chain:__init__:101:  Creating JobChain e9eaef1e-c2e0-4e3b-b942-bfb537162795 for package 4fbac528-53d4-44ec-bbd5-a311ce5064b7 (initial link 2584b25c-8d98-44b7-beca-2b3ea2ea2505)
archivematica-mcp-server_1       | INFO     ThreadPoolExecutor-0_1 2020-03-15 04:48:03 client:run:113:  Running Change object and directory filenames (package 4fbac528-53d4-44ec-bbd5-a311ce5064b7)
archivematica-mcp-server_1       | DEBUG    ThreadPoolExecutor-0_1 2020-03-15 04:48:03 gearman_backend:submit:221:  Submitted gearman job d6473835-078d-4263-869d-5a4e6f48e147 (sanitizeobjectnames_v0.0)
archivematica-mcp-server_1       | ERROR    ThreadPoolExecutor-0_1 2020-03-15 04:48:03 task:_write_file_to_disk:106:  Unable to write to: /var/archivematica/sharedDirectory/currentlyProcessing/mynewdemo-4fbac528-53d4-44ec-bbd5-a311ce5064b7/logs/filenameCleanup.log
archivematica-mcp-server_1       | Traceback (most recent call last):
archivematica-mcp-server_1       |   File "/src/MCPServer/lib/server/tasks/task.py", line 103, in _write_file_to_disk
archivematica-mcp-server_1       |     f.write(contents)
archivematica-mcp-server_1       | UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 729: ordinal not in range(128)
archivematica-mcp-server_1       | DEBUG    ThreadPoolExecutor-0_1 2020-03-15 04:48:03 gearman_backend:update_task_results:265:  Task 9fb77799-128b-429e-8ac0-b4d99abd25f0 finished! Result COMPLETE - 0
archivematica-mcp-server_1       | INFO     ThreadPoolExecutor-0_0 2020-03-15 04:48:03 client:run:113:  Running Change transfer name (package 4fbac528-53d4-44ec-bbd5-a311ce5064b7)
archivematica-mcp-server_1       | DEBUG    ThreadPoolExecutor-0_0 2020-03-15 04:48:03 gearman_backend:submit:221:  Submitted gearman job 98d4332f-bcbd-471d-a7ee-ec2a5a4bb9af (sanitizesipname_v0.0)
archivematica-mcp-server_1       | DEBUG    ThreadPoolExecutor-0_0 2020-03-15 04:48:03 gearman_backend:update_task_results:265:  Task 3331cb40-6ac0-4aaf-99e3-47fba75b956f finished! Result COMPLETE - 0
archivematica-mcp-server_1       | INFO     ThreadPoolExecutor-0_1 2020-03-15 04:48:03 client:run:113:  Running Move to select file ID tool (package 4fbac528-53d4-44ec-bbd5-a311ce5064b7)
ross-spencer commented 4 years ago

Might be related or have some related test data: https://github.com/artefactual/archivematica/issues/1111

joel-simpson commented 1 year ago

Tested today (Archivematica-1.13.2) and TestTransfers/badNames did NOT work as expected (as reported above). The filenameChanges.log showed no changes (despite the fact that many many names were changed).

however, as Sara noted above, the details are all saved in a PREMIS event (although the event name has changed since this defect was logged):

filename change I was looking for a potentially faster/easier way to identify name changes and hoping the log might provide it. Still not sure it would be the best way to do it.