artefactual-sdps / temporal-activities

Temporal activities is a library of general purpose activities
Apache License 2.0
1 stars 0 forks source link

Problem: hidden files interfere with bag validation #2

Closed djjuhasz closed 6 months ago

djjuhasz commented 6 months ago

[Copied from https://github.com/artefactual-sdps/enduro/issues/850]

Is your feature request related to a problem? Please describe.

When submitting a transfer in the bag format, the bag validation check in Archivematica/a3m will fail if there are hidden files in the bag because the hidden files are not included in the bag manifest. This is a particular issue for Mac users, since Macs often add dotfiles (e.g. .DS_Store). This can be remedied by the user by manually removing hidden files from the bags before they are transferred; however, this is both cumbersome and limiting, since the dotfiles can be created every time the user interacts with a file.

A bag validation failure in Archivematica stops the transfer process altogether, so the user has to identify the bag that errored out, remove the hidden files, and restart the ingest.

Describe the solution you'd like

I'd like to prevent any hidden files from being transferred with the bag. The solution should check for and remove hidden files before the bag transfer is ingested into Archivematica/a3m.

In Legacy Enduro, this is done at the point when the bag is copied from the transfer source location to the processing location. Any file beginning with a . is not copied.

This feature should be configurable, so that users can keep hidden files if they choose. It would also be preferrable to allow users to edit the list of files that should be removed/ignored, as is done in Archivematica/a3m's Remove hidden files and directories and Remove unneeded files jobs.

Describe alternatives you've considered

The manual method mentioned above does work but it is susceptible to human error, and might need to be repeated should the user have a need to look at the files in the bag.

Additional context

Note that this is only a requirement for bagged transfers. For standard and other non-bagged transfer types, Archivematica/a3m remove hidden files as a matter of course during the early stages of processing.

The client for whom this has been an issue uses unzipped bags, which both lends itself to the problem manifesting AND provides the easy solution of simply not copying dotfiles, as is done in Legacy Enduro. I'm not sure how the issue would be dealt with in a zipped bag, where the whole bag is copied as a single entity. Perhaps focusing on the unzipped bag example is the easiest starting point.