facebookresearch / cc_net

Tools to download and cleanup Common Crawl data
MIT License
932 stars 138 forks source link

Batch job submission failed: Invalid job array specification #31

Open swgu98 opened 2 years ago

swgu98 commented 2 years ago

Hi, when I run "python -m cc_net", this error happened:

Submitting _hashes_shard in a job array (1600 jobs) sbatch: error: Batch job submission failed: Invalid job array specification subprocess.CalledProcessError: Command '['sbatch', '/data/gsw/test/cc_net/data/logs/submission_file_479eba35e148432da4432891c1191887.sh']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/gsw/test/cc_net/cc_net/main.py", line 18, in main() File "/data/gsw/test/cc_net/cc_net/main.py", line 14, in main func_argparse.parse_and_call(cc_net.mine.get_main_parser()) File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/func_argparse/init.py", line 72, in parse_and_call return command(*parsed_args) File "/data/gsw/test/cc_net/cc_net/mine.py", line 632, in main all_files = mine(conf) File "/data/gsw/test/cc_net/cc_net/mine.py", line 335, in mine hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem)) File "/data/gsw/test/cc_net/cc_net/mine.py", line 263, in hashes ex(_hashes_shard, repeat(conf), _transpose(missing_outputs)) File "/data/gsw/test/cc_net/cc_net/execution.py", line 89, in map_array_and_wait jobs = ex.map_array(function, *args) File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/core/core.py", line 701, in map_array return self._internal_process_submissions(submissions) File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions return self._executor._internal_process_submissions(delayed_submissions) File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions first_job: core.Job[tp.Any] = array_ex._submit_command(self._submitit_command_str) File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/core/core.py", line 864, in _submit_command output = utils.CommandFunction(command_list, verbose=False)() # explicit errors File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/core/utils.py", line 350, in call raise FailedJobError(stderr) from subprocess_error submitit.core.utils.FailedJobError: sbatch: error: Batch job submission failed: Invalid job array specification

gwenzek commented 2 years ago

This seems to be an issue with you SLURM cluster. Can you share the "submission_file.sh" created by submitit ? Does your SLURM cluster support job arrays ?

swgu98 commented 2 years ago

This seems to be an issue with you SLURM cluster. Can you share the "submission_file.sh" created by submitit ? Does your SLURM cluster support job arrays ?

Sorry,Slurm is not installed on my computer.I think this may be the reason.

peter-ch commented 1 year ago

I installed and configured Slurm, but I still get this error:

Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/mnt/md0/cc_net/cc_net/main.py", line 18, in main() File "/mnt/md0/cc_net/cc_net/main.py", line 14, in main func_argparse.parse_and_call(cc_net.mine.get_main_parser()) File "/usr/local/lib/python3.8/dist-packages/func_argparse/init.py", line 72, in parse_and_call return command(*parsed_args) File "/mnt/md0/cc_net/cc_net/mine.py", line 631, in main all_files = mine(conf) File "/mnt/md0/cc_net/cc_net/mine.py", line 334, in mine hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem)) File "/mnt/md0/cc_net/cc_net/mine.py", line 263, in hashes ex(_hashes_shard, repeat(conf), _transpose(missing_outputs)) File "/mnt/md0/cc_net/cc_net/execution.py", line 89, in map_array_and_wait jobs = ex.map_array(function, *args) File "/usr/local/lib/python3.8/dist-packages/submitit/core/core.py", line 771, in map_array return self._internal_process_submissions(submissions) File "/usr/local/lib/python3.8/dist-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions return self._executor._internal_process_submissions(delayed_submissions) File "/usr/local/lib/python3.8/dist-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions first_job: core.Job[tp.Any] = array_ex._submit_command(self._submitit_command_str) File "/usr/local/lib/python3.8/dist-packages/submitit/core/core.py", line 934, in _submit_command output = utils.CommandFunction(command_list, verbose=False)() # explicit errors File "/usr/local/lib/python3.8/dist-packages/submitit/core/utils.py", line 352, in call raise FailedJobError(stderr) from subprocess_error submitit.core.utils.FailedJobError: sbatch: error: Batch job submission failed: Invalid job array specification