IUSCA / bioloop

Scientific data management portal and pipeline application template
Other
5 stars 2 forks source link

Update tar command to remove absolute paths from the archive #264

Closed deepakduggirala closed 2 months ago

deepakduggirala commented 3 months ago

Description

Current command create tar files with full path of the directory. Ex:

running the command tar cf /N/scratch/bioloop/foo.tar --sparse /N/project/bioloop/raw_data/test_dataset produces a foo.tar when extracted

tar xf foo.tar produces the below hierarchy in the directory where it was extracted.

tar cf /N/scratch/bioloop/foo.tar --sparse -C /N/project/bioloop/raw_data/test_dataset . removes absolute paths from the archive and when extracted just produces:

`tar xf foo.tar

deepakduggirala commented 3 months ago

Need to validate this is dev environment

ri-pandey commented 2 months ago

Tested in dev env.

Tested with Data Product sub-fsm54jc.

Ran the 'stage' workflow on this dataset. Results after staging:

[scadev@colo25 staged]$ cd data_products/
[scadev@colo25 data_products]$ cd 3a1e0f21c1770ef5dd6bd10f92aa35b4
[scadev@colo25 3a1e0f21c1770ef5dd6bd10f92aa35b4]$ ls
sub-fsm54jc

The extracted tar file's root path is the dataset's containing directory.

@deepakduggirala I am assuming that you already ran the archive step on all datasets in the dev env, which fixed the paths inside the tars persisted to SDA?

deepakduggirala commented 2 months ago

@ri-pandey No, I have not.

ri-pandey commented 2 months ago

@deepakduggirala I tested the downloaded bundle, and unextracting it has the correct paths.

Fwiw, I used the 'Archive Utility' app on Mac to extract the bundle. This resulted in the root directory of the extracted contents to be extracted to the same path where the bundle is located.

Screenshot 2024-09-06 at 12 04 55 PM

Running tar xvf a49d5175a091376bf5785768e6386e0e however results in the the sub-directories inside the dataset to be extracted to the same directory where the bundle is located.

Screenshot 2024-09-06 at 12 07 24 PM

I used dataset 35 (sub-fsm84zy_copy) in the bioloop dev env to test this.

deepakduggirala commented 2 months ago

@ri-pandey Looks good. Feel free to merge when you're ready.