JiscSD / archivematica

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
http://www.archivematica.org
GNU Affero General Public License v3.0
0 stars 0 forks source link

Problem: METS files are too large when AIP contains many files #41

Open jhsimpson opened 6 years ago

jhsimpson commented 6 years ago

The Index AIP microservice fails when the METS file it is trying to index is larger than 104mb. This is due to a configuration limit in Elasticsearch, on the maximum size of a json document that can be indexed.

While it is possible to increase this parameter to make it possible to index larger files, this is likely not a scalable solution. A better approach would be to reduce the size of the METS file.

One potential way to get a much smaller METS file is to remove the raw tool output that is currently recorded in the premis event outcome detail note tag of each premis event. The trade off is that this data would not be available anymore. It may be possible to make stdout and stderr available through a different mechanism (via a logging system, rather than inside the METS) but it is worth evaluating if this is necessary.