Closed mirzaees closed 3 years ago
The step_io_load_list
assignment then would be in submit_jobs.bash
, I suppose.
Please also implement reading of the io_load form `job_defaults.cfg. For now lets have both, reading from job_defaults.cfg and hardwired. Once everything works I will remove the hardwired lines.
------------------------------------------------------------------------------------------------------
name c_walltime s_walltime c_memory s_memory num_threads io_load
-----------------------------------------------------------------------------------------------------
default 02:00:00 0 3000 0 2 1
# topsStack
unpack_topo_reference 0 00:01:00 4000 0 8 0.2
unpack_secondary_slc 0 00:00:10 4000 0 2 1
average_baseline 0 00:00:10 1000 0 2 1
extract_burst_overlaps 0 00:00:10 4000 0 2 1
With this, in the final minsar package all parameters will be specified in 3 files: platforms_defaults.cfg
, queues.cfg
and job_defaults.cfg
. I would suggest to create a reader read_config.bash
(the code is already utils/read_platform_defaults.bash
, and use in all scripts (bash and python) caps for the variables that get assigned in *cfg files (e.g. MAX_JOBS_PER_QUEUE and TOTAL_MAX_TASKS)
read_config.bash
should skip assignment if it exists as environment variable. That allows to try with a different values without modifying a *cfg file.
(in job_submisson.py we use get_config_defaults
, and suggestion for a common name? )
cat ${HOME}/accounts/suggestion_platforms_defaults.cfg
###################################################################################################
echo "exporting environment variables using ~/accounts/platforms_defaults.cfg ..."
###################################################################################################
# set environment variables. For example for PLATFORM_NAME stampede2 do `export JOBSCHEDULER=SLURM`
###################################################################################################
PLATFORM_NAME JOBSCHEDULER QUEUENAME JOB_SUBMISSION_SCHEME JOBSHEDULER_PROJECTNAME SCRATCHDIR WORKDIR
stampede2 SLURM skx-normal launcher_multiTask_singleNode TG-EAR200012 ${SCRATCH} ~/insarlab
frontera SLURM normal launcher_multiTask_singleNode EAR20013 ${SCRATCH} ~/insarlab
comet SLURM compute singleTask EAR20013 /oasis/scratch/comet/$USER/temp_project ~/insarlab
deqing_server PBS batch singleTask TG-EAR180012 ${SCRATCH} ~/insarlab
eos PBS batch singleTask NONE /scratch/insarlab/${USER_PREFERRED} ~/insarlab
jetstream NONE NONE NONE NONE /data/HDF5EOS ~/insarlab
mac NONE NONE NONE NONE ~/insarlac/scratch ~/insarlab
cat minsar/defaults/queues.cfg
PLATFORM_NAME QUEUENAME CPUS_PER_NODE THREADS_PER_CORE MEM_PER_NODE MAX_JOBS_PER_WORKFLOW MAX_JOBS_PER_QUEUE WALLTIME_FACTOR
stampede2 skx-normal 48 2 192000 12 25 1
stampede2 skx-dev 48 2 192000 1 25 1
stampede2 normal 48 4 96000 50 25 1
stampede2 development 48 4 96000 1 25 1
frontera normal 56 1 192000 12 100 1
frontera development 56 1 192000 1 100 1
frontera flex 56 1 192000 12 100 1
frontera nvdimm 48 1 2100000 8 100 1
The
step_io_load_list
assignment then would be insubmit_jobs.bash
, I suppose.Please also implement reading of the io_load form `job_defaults.cfg. For now lets have both, reading from job_defaults.cfg and hardwired. Once everything works I will remove the hardwired lines.
------------------------------------------------------------------------------------------------------ name c_walltime s_walltime c_memory s_memory num_threads io_load ----------------------------------------------------------------------------------------------------- default 02:00:00 0 3000 0 2 1 # topsStack unpack_topo_reference 0 00:01:00 4000 0 8 0.2 unpack_secondary_slc 0 00:00:10 4000 0 2 1 average_baseline 0 00:00:10 1000 0 2 1 extract_burst_overlaps 0 00:00:10 4000 0 2 1
Make a new issue please. Stop adding unrelated tasks to current issues.
@mirzaees I wrote a generalized function that I just committed. Try it out. It just a submission script, as I can't really generalize the "wait until finished" functionality. If that is important for your purposes, you can write a wrapper around the new sbatch_conditional
function that does what you want.
Hi @Ovec8hkin , that works nicely, thank you!
I just don't know why you kept 'run_01' as an argument, we could use step name itself to find patterns.
then if a job name starts with 'run_01' , the command would be:
sbatch_conditional.bash --step_name run_01_unpack_topo_reference
Its because of how Falk has defined the step names for submitjobs. Because they are independent of the "run*_" notation at the beginning, you need to pass the step name separately to properly lookup tasks using the same step name (since Falk claims that step names don't always run in the same order depending on the workflow). In general, if you don't have separate naming convention for step names, you won't need to use the step_name option, and can just run: sbatch_conditional.bash run_01_unpack_topo_reference
.
That is right, thank you!
I have not understood yet either. I would have expected to submit one job as sbatch_conditional.bash run_01_unpack_topo_reference_0.job
or as sbatch_conditional.bash run_02_*_1.job
. I suspect there is a reason for that?
An alternative would be to submit multiple jobs as sbatch_conditional.bash run_02_*.job
I don't remember weather we decided anything about that.
Josh, will you be availbale in the afternoon tomorrow? We should meet. Else I will try to get with Sara on the same page (afternoon, Sara),
@falkamelung The syntax you're using above is incorrect. You don't pass a glob of files to sbatch conditional, just a run file pattern ie "run_02" (distinct from "run_02*"). The script handles finding the proper files. The extra --step_name parameter is for manually defining the name of the processing step being used. For most cases, the --setp_name is not necsarry and the script will default back to using the initially provided pattern. But for submit_jobs.bash we have to pass manual --step_names due to the naming conventions.
done
Hi @Ovec8hkin
I am making this issue to discuss about how we need sbatch_conditional.bash
you have seen our regular run files in the run_files folder, each having several tasks. we are making jobs for these run files (batch files) and then submit with your submit_jobs.bash. Currently it is working great and takes care of everything including resubmit after failure, number of active jobs and so on.
but the problem is that, there are several things hard coded which makes it work only for this setup of folders and jobs. even the run files names are hard coded. I see for example 'step_io_load_list' in submit_jobs.bash.
What I need, is some added capability to work with a single new batch file containing several tasks. the name of the batch file could be arbitrary. we need to be able to submit jobs corresponding to a batch file. Those jobs are written before (meaning that, there is no need to think about memory, walltime, ...) and ready to be submitted
For example: I have a run file named:
run_arbitrary_name
There are multiple jobs created for this as:
we need a script like sbatch_conditional.bash or generally submit_jobs.bash itself to be able to run these jobs. The input options would be as follow:
sbatch_conditional.bash --pattern run_arbitrary_name --step_max_tasks 1000 --total_max_tasks 3000
step_max_tasks and total_max_tasks should have a default and be optional the script looks for the jobs with the given pattern, submits them and waits for them to finish. All the checking for number of tasks, failures, ... would be same as before
You have written similar function in job_submission.py if I call job_submission.py with a batch file, it can submit it as one or several jobs depending on the number of tasks the only difference here would be that I don't want you to write jobs, only find them and submit them
Also, the working directory should be where the
run_arbitrary_name
(pattern) exists. not depending on $SCRATCH or $PROJECTNAME