deepmodeling / dpdispatcher

generate HPC scheduler systems jobs input scripts and submit these scripts to HPC systems and poke until they finish
https://docs.deepmodeling.com/projects/dpdispatcher/
GNU Lesser General Public License v3.0
42 stars 56 forks source link

ERROR when number of backward files > 100 in SSH context #471

Closed thangckt closed 1 month ago

thangckt commented 1 month ago

I get this error when number of backward files > 100 in SSH context

\site-packages\dpdispatcher\contexts\ssh_context.py", line 766, in block_checkcall
    raise RuntimeError(
RuntimeError: Get error code 2 in calling tar czfh 8a65bd44bf1e358f1d4f794fdeacb1c34ffd86c2.tar.gz -T '/home/thang/work/w24_dpjob/8a65bd44bf1e358f1d4f794fdeacb1c34ffd86c2\.tmp.tar.5abbc244-6887-4cd4-8691-3e6fa45d1f74' through ssh with job: 8a65bd44bf1e358f1d4f794fdeacb1c34ffd86c2 . message: tar: /home/thang/work/w24_dpjob/8a65bd44bf1e358f1d4f794fdeacb1c34ffd86c2\\.tmp.tar.5abbc244-6887-4cd4-8691-3e6fa45d1f74: Cannot stat: No such file or directory

Then I try with number of files <100, everything work well

It's seemly that the file file_list_file was not created successfully.

njzjz commented 1 month ago

Do you use Windows?

njzjz commented 1 month ago

Please tell me whether #472 fixes the issue. It's difficult to test it in the CI, which needs both Linux and Windows environments.