archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: pipelines can't split transfers automatically #224

Open sevein opened 5 years ago

sevein commented 5 years ago

Back in 2014 we received a pull request from @pbrantner suggesting a simple method to split transfers automatically taking advantage of watched directories. A pull request was submitted but we've never been able to analyze it in depth.

https://github.com/artefactual/archivematica/pull/99

The pull request has been inactive for more than three years. I'm filing this issue to capture the feature request and create a space where we can discuss other solutions, e.g. could this be achieved via automation tools and would it be preferably?


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

sromkey commented 5 years ago

I'm not sure what is meant by "split transfers" in this context and how they would be split? But seems worth discussing/considering.

sevein commented 5 years ago

From what I understood the workflow suggested by https://github.com/artefactual/archivematica/pull/99 is the following:

User with physical access to the Archivematica Shared Directory (usually /var/archivematica/sharedDirectory) moves a new transfer into watchedDirectories/activeTransfers/splittedTransfers.

Archivematica has a new watcher pointing to splittedTransfers. When the user adds a new directory, the client script provided in the pull request will start a new transfer for each directory found in the root of the directory provided by the user. For example, the user adds FOOBAR with three directories inside: 1, 2 and 3.

/var/archivematica/sharedDirectory/watchedDirectories/activeTransfers/splittedTransfers/FOOBAR
├── 1
│   └── [contents]
├── 2
│   └── [contents]
└── 3
    └── [contents]

The script creates three transfers: SIP-1, SIP-2, SIP-3.

My guess is that the mechanisms to address this use case would be very different today, e.g. I don't think we'd want users to work with watched directories and we would probably be looking at exposing the functionality via the API instead, and integrated with SS transfer source locations?