As a DevOps architect, I want to implement a file manifest collection system to efficiently organize and track files, where each file is represented by a URL using a symlink or the SCP protocol. This will enable us to streamline the process of managing metadata and indexing files where we cannot move the data to a managed bucket
Acceptance Criteria
[ ] As a user, I should be able to initiate the collection of a manifest of files indicating that the files should NOT be uploaded.
[ ] Add a parameter to the g3t add command --no-upload
[ ] The path added should still be a relative path from the project directory
[ ] The add command should populate the DocumentReference's 'source_path' extension with an scp://<hostname>/<full-path> style url
[ ] On push, the system should:
[ ] upload all file meta data to indexd
[ ] continue to upload files added without the --no-upload parameter to the project bucket via gen3-client
[ ] skip uploading any files added with the --no-upload parameter
[ ] On pull or clone, the system should:
[ ] continue to download all files from the project bucket as it does now
[ ] when the current hostname == the url's scp://<hostname> the system should create a symlink in the project directory. The symlink creation process should reflect the file's original structure, maintaining the directory hierarchy specified in the manifest.
[ ] when the current hostname != the url's scp://<hostname> the system should initiate a multi-threaded scp download, ideally outsourcing this job to utility package on the user's machine. The system should provide a mechanism for the user to authenticate and authorize access to the files using SCP credentials.
[ ] The system should handle errors gracefully and provide informative messages for troubleshooting, especially in cases of connection issues or authentication failures.
User Story
As a DevOps architect, I want to implement a file manifest collection system to efficiently organize and track files, where each file is represented by a URL using a symlink or the SCP protocol. This will enable us to streamline the process of managing metadata and indexing files where we cannot move the data to a managed bucket
Acceptance Criteria
[ ] As a user, I should be able to initiate the collection of a manifest of files indicating that the files should NOT be uploaded.
--no-upload
scp://<hostname>/<full-path>
style url[ ] On
push
, the system should:--no-upload
parameter to the project bucket via gen3-client--no-upload
parameter[ ] On
pull
orclone
, the system should:scp://<hostname>
the system should create a symlink in the project directory. The symlink creation process should reflect the file's original structure, maintaining the directory hierarchy specified in the manifest.scp://<hostname>
the system should initiate a multi-threaded scp download, ideally outsourcing this job to utility package on the user's machine. The system should provide a mechanism for the user to authenticate and authorize access to the files using SCP credentials.[ ] The system should handle errors gracefully and provide informative messages for troubleshooting, especially in cases of connection issues or authentication failures.