Is your feature request related to a problem? Please describe.
We have created a number different implementations of an activity to extract the contents of an archive file (e.g. tar, zip) in the different Enduro projects:
There are also several private client implementations that I haven't listed here.
Maintaining all the different implementations is time consuming. There is also variation in how each implementation deals with archives that include a single, top-level directory, and whether they extract to a temporary file or not.
Describe the solution you'd like
Implement a general extract activity in this project that can be imported into the various Enduro projects and client pre-processing workflows. Having a single implementation will reduce maintenance costs and make the extraction results more consistent and predictable.
Describe alternatives you've considered
Implement an archive extraction package (not a Temporal activity) in a stand-alone repository.
It checks for a single top-level directory after extraction and if one is present then it returns the path to that directory, without requiring the caller to set a "removeTopLevelDirectory" flag to remove the top-level dir
It doesn't try to remove at top-level directory from the extraction directory, it just returns the path of the extraction directory or the top-level directory depending, which achieves the same goal for the caller and is a simpler solution
It extracts the archive to a random temporary directory which avoids possible path contamination or extract errors if the same package is extracted more than once
Is your feature request related to a problem? Please describe.
We have created a number different implementations of an activity to extract the contents of an archive file (e.g. tar, zip) in the different Enduro projects:
There are also several private client implementations that I haven't listed here.
Maintaining all the different implementations is time consuming. There is also variation in how each implementation deals with archives that include a single, top-level directory, and whether they extract to a temporary file or not.
Describe the solution you'd like
Implement a general extract activity in this project that can be imported into the various Enduro projects and client pre-processing workflows. Having a single implementation will reduce maintenance costs and make the extraction results more consistent and predictable.
Describe alternatives you've considered
Additional context
The https://github.com/artefactual-sdps/preprocessing-sfa/blob/main/internal/activities/extract_package.go implementation has a few nice features that I think should be included in this repo: