archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: It isn't possible to protect certain files from deletion, e.g. hidden files #418

Open ross-spencer opened 5 years ago

ross-spencer commented 5 years ago

Please describe the problem you'd like to be solved.

It may be desirable to protect certain files from deletion, or not have hidden (.filename) or temporary (filename~) files deleted at all.

Describe the solution you'd like to see implemented.

Allow users to flag files that shouldn't be deleted.

Describe alternatives you've considered.

Users could rename files before transfer, but this makes it much harder to practice good archival practice.

Additional context

Having recently seen a transfer that looks as follows and without an ability to offer a configuration workaround, there does seem to be a gap in functionality.

mbox-example$ tree -a
.
└── mbox
    ├── Drafts
    │   └── .mbox
    ├── INBOX
    │   └── .mbox
    ├── Junk
    │   └── .mbox
    ├── processingMCP.xml
    ├── Sent
    │   └── .mbox
    ├── Webmail-old
    │   ├── INBOX
    │   │   └── .mbox
    │   ├── .mbox
    │   └── sent-mail-Oct-2008
    │       └── .mbox
    └── Trash

Reader's looking at this issue might want to offer up other real-life examples, or describe why such a feature might not be a good idea. It might just be important to emphasize the importance of not being able to preserve files that Archivematica considers to be 'hidden'.

If files with . prefixes or ~ prefixes are important to the user, then creating a manifest for the transfer metadata directory may be an important step they'd like to take.


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

jfcarrano commented 3 years ago

In our case at MIT, we often make no attempt to remove hidden or system files prior to transfer. Some of the tooling we use to create our transfers to send to Archivematica will include them in checksum manifests. So when we transfer files in a standard transfer and hidden files are removed by Archivematica, the transfer is invalid later when the checksum manifest is checked due to a mismatch in number of files.

I could foresee cases where the hidden files would be useful to keep, especially in cases of software or software dependent files. I don't have any real use cases for that at present though. I do know of one case where files were transferred to us with the hidden attribute set mistakenly (they were just regular video files), so if someone wasn't paying attention in such cases, there could might be some accidental deletion. I think in our cases or in cases of more intentional desire to save hidden files, the step to remove them in Archivematica should be optional or there should be a way to indicate which of the hidden files (or which type) you wish to save.

jfcarrano commented 2 years ago

Here is another use case that caused us (MIT) issues recently. We have some files in a collection where the creator downloaded a webpage for offline use. They named it starting with a period, i.e. ".nameofwebsite.org.html", which also means the folder with the associated content (style, assets, etc.) also had a period in it's name, i.e. ".nameofwebsite.org_Files".

With the current functionality, both are deleted by the remove hidden files microservice. We only noticed because we include a checksum manifest, which failed when the files could not be found. We ended up renaming the files to remove the periods, but this breaks the html connection to the assets folder, as the internal references point to the folder that starts with a period.

In the future we could provide access by renaming the files back and having the html work properly again. This is similar to how the create dip script in automation tools gets around some of the renaming issues but this is a little more time consuming with the renaming not being recorded in the METS.