When the AIP being compressed contains thousands of files, the std out gets very large, and the extra output is not useful. In one example, an AIP with 37,000 original files, the aip compression premis event recorded by this client script became over 99% of the total content of the AIP's pointer file. The output is just endless lines starting with 'compressing x . ..'
The pointer file becomes unusable and can cause failures in the storage service when the aip is stored.
(example here rdss-archivematica#106).
It would be better to ignore the std out of this tool, not write it to the database at all and allow the premis event outcome detail note to be empty.
That work is intended to be released by the end of 2017. It would be useful to change just this one compressAIP client script here in the JiscRDSS repo and test it with the large datasets available in the Jisc environment, and then consider how best to merge with the ogoing work upstream.
The compressAIP client script runs a compression utility and records the standard out and standard error from that tool in the database. The code doing this is here: https://github.com/JiscRDSS/archivematica/blob/qa/jisc/src/MCPClient/lib/clientScripts/compressAIP.py#L83-L103
When the AIP being compressed contains thousands of files, the std out gets very large, and the extra output is not useful. In one example, an AIP with 37,000 original files, the aip compression premis event recorded by this client script became over 99% of the total content of the AIP's pointer file. The output is just endless lines starting with 'compressing x . ..'
The pointer file becomes unusable and can cause failures in the storage service when the aip is stored.
(example here rdss-archivematica#106).
It would be better to ignore the std out of this tool, not write it to the database at all and allow the premis event outcome detail note to be empty.
It is worth pointing out that there is related work going on in the upstream project - documented here: https://github.com/artefactual-labs/archivematica-acceptance-tests/pull/37
That work is intended to be released by the end of 2017. It would be useful to change just this one compressAIP client script here in the JiscRDSS repo and test it with the large datasets available in the Jisc environment, and then consider how best to merge with the ogoing work upstream.