deepmodeling / dpdispatcher

generate HPC scheduler systems jobs input scripts and submit these scripts to HPC systems and poke until they finish
https://docs.deepmodeling.com/projects/dpdispatcher/
GNU Lesser General Public License v3.0
42 stars 56 forks source link

[Feature Request] Split Large ‘tar’ Archive into Multiple Files of Certain Size #462

Open thangckt opened 2 months ago

thangckt commented 2 months ago

Dear developers,

Can you give an option to split a large .tar file into smaller parts before downloading?

It will more efficient for transferring.

I face a problem that the backward files have large size, then it can not be downloaded, even dpdispatcher finishs without error.

Thank

felix5572 commented 2 months ago

for temp use just modify dpdispatcher/contexts/ssh_context.py _get_file method? (tar_command and split command).

For example

# Step 1: Create a tar archive
tar -cvf archive.tar largefile.txt

# Step 2: Split the tar archive into 100 MB chunks
split -b 100M archive.tar archive_part_

# Step 3: Combine the split files back into a single tar file
cat archive_part_* > archive.tar

# Step 4: Extract the contents of the tar file
tar -xvf archive.tar