archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Audit the purposes and intended uses of the various Archivematica SIP/AIP log files #251

Open ross-spencer opened 6 years ago

ross-spencer commented 6 years ago

Please describe the problem you'd like to be solved.

Audit, and possibly describe, and or remove where appropriate, the various functions of the Archivematica logs, e.g.

Raised as a question by @sevein in Slack, there is a chunk of effort used to create these in Archivematica but it is not clear how they are used. It may not always be clear when they are created either.

Describe the solution you'd like to see implemented.

Improved docs would be good, but any extraneous logging, e.g. where information is reflected in the METS files, could also be removed. Database writes could be reduced in size and the AIP sizes reduced at the same time.


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

ross-spencer commented 6 years ago

Also from @sevein where folks want to see the tasks that output log information they can run this query in the DB:

(Although I wrote this command so it might be wrong!)

use MCP
select * from StandardTasksConfigs
where standardOutputFile != "" or Null
or standardErrorFile != "" or Null

Result

pk execute arguments filterSubDir filterFileStart filterFileEnd standardOutputFile standardErrorFile lastModified replaces
02fd0952-4c9c-4da6-9ea3-a1409c87963d identifyFileFormat_v0.0 "%IDCommand%" "%relativeLocation%" "%fileUUID%" objects/attachments NULL NULL %SIPLogsDirectory%fileFormatIdentification.log %SIPLogsDirectory%fileFormatIdentification.log 2013-11-07 22:51:43.000000 NULL
0bd2524d-573a-4c8e-86b4-8fa54a5acfad bindPIDs_v0.0 "%SIPUUID%" "%sharedPath%" --bind-pids "%BindPIDs%" NULL NULL NULL %SIPLogsDirectory%handles.log NULL 2018-10-04 21:07:41.799581 NULL
0c6990d8-ce1f-4093-803b-5ca6256119ca sanitizeSIPName_v0.0 "%relativeLocation%" "%SIPUUID%" "%date%" "%sharedPath%" "%unitType%" NULL NULL NULL NULL %SIPLogsDirectory%SIPnameCleanup.log 2012-10-02 00:25:01.000000 NULL
2ad612bc-1993-407e-9d66-a8ab9c1ebbd5 assignFileUUIDs_v0.0 --transferUUID "%SIPUUID%" --sipDirectory "%SIPDirectory%" --filePath "%relativeLocation%" --fileUUID "%fileUUID%" --eventIdentifierUUID "%taskUUID%" --date "%date%" objects NULL NULL %SIPLogsDirectory%FileUUIDs.log %SIPLogsDirectory%FileUUIDsError.log 2012-10-02 00:25:01.000000 NULL
2fdb8408-8bbb-45d1-846b-5e28bf220d5c archivematicaClamscan_v0.0 "%fileUUID%" "%relativeLocation%" "%date%" "%taskUUID%" objects/submissionDocumentation NULL NULL NULL %SIPLogsDirectory%clamAVScan.txt 2012-10-02 00:25:01.000000 NULL
34966164-9800-4ae1-91eb-0a0c608d72d5 assignFileUUIDs_v0.0 --sipUUID "%SIPUUID%" --sipDirectory "%SIPDirectory%" --filePath "%relativeLocation%" --fileUUID "%fileUUID%" --eventIdentifierUUID "%taskUUID%" --date "%date%" --use "metadata" --disable-update-filegrpuse objects/metadata NULL NULL %SIPLogsDirectory%FileUUIDs.log %SIPLogsDirectory%FileUUIDsError.log 2013-02-13 22:03:40.000000 NULL
49b803e3-8342-4098-bb3f-434e1eb5cfa8 removeUnneededFiles_v0.0 "%relativeLocation%" "%fileUUID%" objects NULL NULL %SIPLogsDirectory%removeUnneededFiles.log %SIPLogsDirectory%removeUnneededFiles.log 2012-10-02 00:25:01.000000 NULL
57d42245-79e2-4c2d-8ed3-b596cce416db assignFileUUIDs_v0.0 --transferUUID "%SIPUUID%" --sipDirectory "%SIPDirectory%" --filePath "%relativeLocation%" --fileUUID "%fileUUID%" --eventIdentifierUUID "%taskUUID%" --date "%date%" objects NULL NULL %SIPLogsDirectory%FileUUIDs.log %SIPLogsDirectory%FileUUIDsError.log 2012-10-02 00:25:01.000000 NULL
58b192eb-0507-4a83-ae5a-f5e260634c2a sanitizeObjectNames_v0.0 "%SIPDirectory%objects/metadata/" "%SIPUUID%" "%date%" "%taskUUID%" "SIPDirectory" "sip_id" "%SIPDirectory%" objects/metadata NULL NULL %SIPLogsDirectory%filenameCleanup.log %SIPLogsDirectory%filenameCleanup.log 2013-02-13 22:03:39.000000 NULL
614b1d56-9078-4cb0-80cc-1ea87b9fbbe8 assignFileUUIDs_v0.0 --sipUUID "%SIPUUID%" --sipDirectory "%SIPDirectory%" --filePath "%relativeLocation%" --fileUUID "%fileUUID%" --eventIdentifierUUID "%taskUUID%" --date "%date%" --use "submissionDocumentation" objects/submissionDocumentation NULL NULL %SIPLogsDirectory%FileUUIDs.log %SIPLogsDirectory%FileUUIDsError.log 2012-10-02 00:25:01.000000 NULL
7316e6ed-1c1a-4bf6-a570-aead6b544e41 archivematicaClamscan_v0.0 "%fileUUID%" "%relativeLocation%" "%date%" "%taskUUID%" objects/metadata NULL NULL NULL %SIPLogsDirectory%clamAVScan.txt 2013-02-13 22:03:39.000000 NULL
80759ad1-c79a-4c3b-b255-735c28a50f9e sanitizeObjectNames_v0.0 "%SIPObjectsDirectory%" "%SIPUUID%" "%date%" "%taskUUID%" "transferDirectory" "transfer_id" "%SIPDirectory%" objects NULL NULL %SIPLogsDirectory%filenameCleanup.log %SIPLogsDirectory%filenameCleanup.log 2012-10-02 00:25:01.000000 NULL
89b4d447-1cfc-4bbf-beaa-fb6477b00f70 sanitizeObjectNames_v0.0 "%SIPObjectsDirectory%attachments/" "%SIPUUID%" "%date%" "%taskUUID%" "transferDirectory" "transfer_id" "%SIPDirectory%" objects/attachments NULL NULL %SIPLogsDirectory%filenameCleanup.log %SIPLogsDirectory%filenameCleanup.log 2012-10-02 00:25:01.000000 NULL
8fad772e-7d2e-4cdd-89e6-7976152b6696 extractContents_v0.0 "%SIPUUID%" "%transferDirectory%" "%date%" "%taskUUID%" "%DeletePackage%" NULL NULL NULL %SIPLogsDirectory%extractContents.log NULL 2013-11-07 22:51:43.000000 NULL
9c3680a5-91cb-413f-af4e-d39c3346f8db identifyFileFormat_v0.0 "%IDCommand%" "%relativeLocation%" "%fileUUID%" --disable-reidentify objects NULL NULL %SIPLogsDirectory%fileFormatIdentification.log %SIPLogsDirectory%fileFormatIdentification.log 2013-11-07 22:51:42.000000 NULL
a32fc538-efd1-4be0-95a9-5ee40cbc70fd removeFilesWithoutPresmisMetadata_v0.0 --fileUUID "%fileUUID%" --inputFile "%relativeLocation%" --sipDirectory "%SIPDirectory%" objects/ NULL NULL %SIPLogsDirectory%removedFilesWithNoPremisMetadata.log %SIPLogsDirectory%removedFilesWithNoPremisMetadata.log 2012-10-02 00:25:01.000000 NULL
a5bb8df6-a8f0-4279-ac6d-873ec5cf37cd verifyChecksumsInFileSecOfDspaceMETSFiles_v0.0 "%relativeLocation%" "%date%" "%taskUUID%" objects NULL mets.xml %SIPLogsDirectory%verifyChecksumsInFileSecOfDSpaceMETSFiles.log %SIPLogsDirectory%verifyChecksumsInFileSecOfDSpaceMETSFiles.log 2012-10-02 00:25:01.000000 NULL
ad65bf76-3491-4c3d-afb0-acc94ff28bee sanitizeObjectNames_v0.0 "%SIPDirectory%objects/submissionDocumentation/" "%SIPUUID%" "%date%" "%taskUUID%" "SIPDirectory" "sip_id" "%SIPDirectory%" objects/submissionDocumentation NULL NULL %SIPLogsDirectory%filenameCleanup.log %SIPLogsDirectory%filenameCleanup.log 2012-10-02 00:25:01.000000 NULL
b055f0a4-75d7-4747-98fe-aab08d835403 bindPID_v0.0 "%fileUUID%" --bind-pids "%BindPIDs%" NULL NULL NULL %SIPLogsDirectory%handles.log NULL 2018-10-04 21:07:41.801837 NULL
de58249f-9594-439d-8bea-536ce59d70a3 archivematicaClamscan_v0.0 "%fileUUID%" "%relativeLocation%" "%date%" "%taskUUID%" NULL NULL NULL NULL %SIPLogsDirectory%clamAVScan.txt 2012-10-02 00:25:01.000000 NULL
f8af7e00-0ae4-47ab-9d22-92395ff053fc sanitizeSIPName_v0.0 "%relativeLocation%" "%SIPUUID%" "%date%" "%sharedPath%" "%unitType%" NULL NULL NULL NULL %SIPLogsDirectory%SIPnameCleanup.log 2012-10-02 00:25:01.000000 NULL