LSSTDESC / desc-gen3-prod

Desc-prod wrapper for pipeline production using gen3_workflow.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Version 0.0.41 and beyond #5

Open dladams opened 11 months ago

dladams commented 11 months ago

Continuing development. For earlier changes see #2.

dladams commented 11 months ago

For version 0.0.41 add option to rebuild an existing cvmfs SW cache.

dladams commented 11 months ago

Version 0.0.41 also adds a config option to change the output directory so we can switch output file systems. Procedure is to create a directory or a link to a directory:

/global/cfs/cdirs/lsst/production/gen3/DC2/Run2.2i/repo/u/USER-XXX

where USER is the user name and then include field usuf:XXX in the config to direct data output there. Also, the field pmsYYY is equivalent to usuf:pmsYY and cfsYYY to usuf:cfsYYY. Note that XXX and YYY cannot include the config delimiters dash ('-') or colon (':'). YYY can be blank. If XXX is blank or all these fields are omitted, then the base directory is just USER (no dash).

dladams commented 11 months ago

For version 0.0.42:

dladams commented 11 months ago

For 0.0.42 continued:

dladams commented 10 months ago

For 0.0.43, I am adding script g3wf-run-task that provides a wrapper for running tasks, Here are current capabilities (dev1):

 login11>./g3wf-run-task --help
Usage: g3wf-run-task [OPT1 OPT2 ...] COM
  COM is the run command
  OPTi is any of
    --verbose: Enable verbose logging
      --norun: Do not run user or monitor commands
      -t TSAM: Set sampling period to TSAM seconds (0 to disable)
       --date: Monitor date and time.
         --ps: Monitor user process.
     --pstree: Monitor user process tree.
      --wtree: Monitor process tree for this script.
   --procstat: Monitor status for the run process (/proc/PID/stat).
     --cpunum: Monitor CPU number and affinity for the run process.
   --perfstat: Run task inside "perf stat -d".
All monitoring commands are run every TSAM seconds with that period
in seconds set to 10 or the preceding -t value.
All output lines for monitor flag --FFF begin with "FFF[ TTT]: "
where TTT is the time in seconds since the monitor was started.

Now using task prepend:

PREPEND="g3wf-run-task --perfstat"

to replicate old behavior.

With this, I see the messages from the new task wrapper go to the job log instea of the individual task logs. I modify the script ro send its progress and debig messages to stderr (dev2) and also its monloop messages to stderr (dev3), For the latter, I also add cpunum monitoring.

I leave the script in runapp with cpunum and proctxt monitored at 5 sec. But these do not work in shifter:15 because bc is missing. I fix that in shifter:16 and move to w_2023_44.

dladams commented 9 months ago

For 0.0.44, working toward running jobs with multiple single-frame task types. First some little fixes the setup.sh script produced by for runapp-g3wfpipe:

It appears I can run successfully using the pipeline "ccd" that runapp-g3wfpipe maps to "isr,characterizeImage,calibrate" except if use WorkQueue with shifter. That problem is the subject of issue #7.

As part of trying to address that problem, I added something else?

dladams commented 9 months ago

For 0.0.45, I am trying to fix status reports for multi-type jobs that only report progress for the last type.

I made a change to fix this in g3wfpipe-run.py, removing the qualifier when selecting futures. I am not sure why it was done that way in the first place.

But then test job 1286 failed when failed install of desc-gen3-prod left job in the wrong directory. I changed runapp-g3wfpipe to use subprocess so that failure does not leave us in the wrong dir. Test job 1287 has the desired behavior.

dladams commented 9 months ago

For 0.0.46, investigate limiting the number of concurrent task sequences (e.g. isr-characterizeImage-calibrate) to force sequences to run to completion before starting new sequences.

Note that process groups are described in Gower et al and documented along with the rest of BPS in the BPS quickstart

The job with grouping has the expected change in config.yaml:

 login26> diff job001295/config.yaml job001296
25a26,32
> 
> clusterAlgorithm: lsst.ctrl.bps.quantum_clustering_funcs.dimension_clustering
> cluster:
>   ccd:
>     pipetasks: isr, characterizeImage, calibrate
>     dimensions: detector, visit
>     equalDimensions: visit:exposure
 

I tried to run test job 1297/1298 w/o clustering but failed with exception:

2023-12-04 08:28:42: Finished 50 of 2985 tasks.
./local/desc-gen3-prod/bin/g3wfpipe-run: line 53: 1235164 Segmentation fault      $THISPTH.py $ARGS
g3wfpipe-run: Terminating sysmon.
g3wfpipe-run: Exiting with status 139.

Sysmon shows memory usage fairly flat and under 200 GB. I add stderr to the log in 0.0.46.dev7 to see if I get more info if this happens again.

dladams commented 9 months ago

I changed the processing in g3wfpipe-run.py to directly retrieve endpoint futures and monitor progress based on copies of ParslJob objects. But my busy jobs time out apparently because the ParslJob objects do not correctly reflect the sate of the corresponding task. This is discussed in LSSTDESC/gen3_workflow#59.

Following the discussion in that issue, I monitor progress using job names and check these using the dataframe in ParslGraph. When exec_done is reached, I then use ParslJob to check the log and see if the job succeeded or failed. This is triggered by a flag which is hardwired on for now. If the check fails then a bad log counter is incremented and that nonzero value is reported on the status line.

dladams commented 8 months ago

For 0.0.47, explore adding iotrace as an option. The iotrace I am using is github avner-levy/iotrace. It produces a summary file with one line for each file:

filename, bytes read, bytes written, number of opens, # reads, # writes

I also added a section to run perf stat with the process ID. It is still being run as the prefix in the run command. Here is a comparsion of the two for one task:

Here perf stat is called with the ID of the process that is forked
to execute the run command.

# started on Thu Dec 28 12:26:17 2023

 Performance counter stats for process id '151470':

         38,398.98 msec task-clock:u              #    0.834 CPUs utilized          
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
         1,639,834      page-faults:u             #   42.705 K/sec                  
    91,768,680,967      cycles:u                  #    2.390 GHz                      (36.07%)
       185,319,256      stalled-cycles-frontend:u #    0.20% frontend cycles idle     (35.96%)
       684,704,315      stalled-cycles-backend:u  #    0.75% backend cycles idle      (36.09%)
   302,884,802,959      instructions:u            #    3.30  insn per cycle         
                                                  #    0.00  stalled cycles per insn  (36.08%)
    63,498,038,601      branches:u                #    1.654 G/sec                    (35.71%)
       358,515,054      branch-misses:u           #    0.56% of all branches          (35.80%)

      46.050090565 seconds time elapsed

Here perf stat is the prefix in the run command.

         38,324.33 msec task-clock:u              #    0.853 CPUs utilized
                 0      context-switches:u        #    0.000 /sec
                 0      cpu-migrations:u          #    0.000 /sec
         1,639,544      page-faults:u             #   42.781 K/sec
    92,152,328,018      cycles:u                  #    2.405 GHz                      (36.00%)
       193,446,804      stalled-cycles-frontend:u #    0.21% frontend cycles idle     (35.87%)
       673,915,003      stalled-cycles-backend:u  #    0.73% backend cycles idle      (35.79%)
   306,898,306,137      instructions:u            #    3.33  insn per cycle
                                                  #    0.00  stalled cycles per insn  (35.69%)
    64,264,430,874      branches:u                #    1.677 G/sec                    (35.43%)
       362,851,539      branch-misses:u           #    0.56% of all branches          (35.39%)
    73,691,713,364      L1-dcache-loads:u         #    1.923 G/sec                    (35.48%)
     1,810,377,586      L1-dcache-load-misses:u   #    2.46% of all L1-dcache accesses  (35.50%)
         LLC-loads:u
         LLC-load-misses:u

      44.942931902 seconds time elapsed

      29.389307000 seconds user
       9.352064000 seconds sys
dladams commented 8 months ago

0.0.47 continued.

Changes in g3wf-run-task:

dladams commented 8 months ago

0.0.47 continued:

I created gen3workflow dockerfile version 17 that starts from 16 and adds iotrace. i built an image for w_2023_44 and installed in on perlmutter. I also added a script build-iotrace that does as the name suggests and is used in the dockerfile.

I modified _g3wf_runshifter to improve the help message and change the default version to 17.

I noticed the perfstat info is very wrong (20k instead of 300G instructions, etc) for most of my tasks in batch, e.g. runs 1497 and 1500. This is not true before the perf stat changes for this development series. I modified the task wrapper to use perfstat1 (perf stat command) instead of perfstat2 (perf stat -p PID) and started another job (1502) to see if that fixes the problem. I also put in a NERSC ticket: INC0214404.

Running perf stat as a command wrapper (run 1502) did resolve the preceding problem.

I see most of the iotrace logs are empty except for header. Output has message:

strace: Process 2263952 attached
2263952 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2264454, si_uid=96594, si_status=0, si_utime=4641, si_stime=305} ---
new pid 2263952 process bash -c time /opt/lsst/software/stack/stack/miniconda3-py38_4.9.2-7.0.1/Linux64/ctrl_mpexec/gef1ac55fff+9dabee4918/bin/pipetask --long-log --log-level=VERBOSE run-qbb /global/cfs/cdirs/lsst/production/gen3/DC2/Run2.2i/repo /pscratch/sd/d/dladams/descprod-out/jobs/job001503/submit/u/dladams/ccd/20240108T152445Z/u_dladams_ccd_20240108T152445Z.qgraph --qgraph-node-id fd9294f6-f9ba-4e14-a9d4-b119cca55103 Scanning process 2263952 table for files and sockets
Error: couldn't find file write details, file num  2
Error: couldn't find file write details, file num  2
Error: couldn't find file write details, file num  2
Error: couldn't find file write details, file num  2
2263952 +++ exited with 0 +++
Finished monitoring process

Maybe prepending time causes problems??

Modify runapp so no monitoring or tracing is run by default. Also options will set only one trace.

Add log message of task output space used, rate and free space. Also add monitor in main directory of task space and free space written to task-output-size.log. All for each iteration in the job loop, i.e every 10 sec.

dladams commented 8 months ago

By default the wrapper runs time only--no strace, iotrace, perfstat or any of the monitors. All but the last can be added as config options. Adding that is a todo.

Version is set to 0.0.47.

dladams commented 8 months ago

For 0.0.48, modify runapp to create plot-title.txt. Version 17.16 of monexp.py takes its plot titles from that file if present.

dladams commented 8 months ago

For 0.0.49, fix tracking of failed tasks.

Job 1596 (stage 2 of g3wfpipe | w2344-visit:250:260-pipe:ccd-init-proc | shifter:17-wq:270-bproc:90-tmax:10m:89m) had final status "Done. Finished 0 of 2985 tasks.". There are many (55k) lines in the job log like these:

2024-01-16 11:19:06: WARNING: Unexpected task status: failed
2024-01-16 11:19:06: WARNING: Unexpected task status: dep_fail

and near the end of the log, I see:

2024-01-16 11:26:54: WARNING: Unexpected task status: dep_fail
2024-01-16 11:26:54: Finished 0 of 2985 tasks.
2024-01-16 11:26:55: Task output size:      0.001 GiB, rate:   0.000 GiB/sec, available: 8292476 GiB on /pscratch'
2024-01-16 11:26:55: Aborting job because state is not changing and no tasks are active.
2024-01-16 11:26:55: All steps completed.

I added 'failed' and 'dep_fail' as known task states and increment nfail for those.

Here is the log for one of the isr jobs:

--> executable follows <--
g3wf-run-task ${CTRL_MPEXEC_DIR}/bin/pipetask --long-log --log-level=VERBOSE run-qbb /global/cfs/cdirs/lsst/production/gen3/DC2/Run2.2i/repo /pscratch/sd/d/dladams/descprod-out/jobs/job001595/submit/u/dladams/ccd/20240116T174856Z/u_dladams_ccd_20240116T174856Z.qgraph --qgraph-node-id 069a16e2-fb01-4dd3-9635-eb270e879a11 && >&2 echo success || (>&2 echo failure; false)
--> end executable <--
/bin/bash: g3wf-run-task: command not found
failure
--> executable follows <--
g3wf-run-task ${CTRL_MPEXEC_DIR}/bin/pipetask --long-log --log-level=VERBOSE run-qbb /global/cfs/cdirs/lsst/production/gen3/DC2/Run2.2i/repo /pscratch/sd/d/dladams/descprod-out/jobs/job001595/submit/u/dladams/ccd/20240116T174856Z/u_dladams_ccd_20240116T174856Z.qgraph --qgraph-node-id 069a16e2-fb01-4dd3-9635-eb270e879a11 && >&2 echo success || (>&2 echo failure; false)
--> end executable <--
/bin/bash: g3wf-run-task: command not found
failure

I suspect the "command not found" happens because the WorkQueue worker does not have this on its path:

g3wfpipe-run: Location for g3wf-run-task: /pscratch/sd/d/dladams/descprod-out/jobs/job001596/local/desc-gen3-prod/bin/g3wf-run-task
dladams commented 8 months ago

To address the last, I add a task setup script task-setup.sh with the intention of sourcing that as part of the task prepend. That didn't work and turned out to unnecessary as the problem was that the job setup script setup.sh was using relative paths for the products installed in ./local but the WorkQueue worker uses a dedicated directory for each task instead of running tasks from the job directory. I changed the setup to use full paths and can now run with WorkQueue.

The task wrapper will source task-setup.sh if it is found in the run directory but no such file is generated.

dladams commented 8 months ago

Version 0.0.50 fixes notebooks.

Last version broke the copying of notebooks to the job dir.

I also created a tag for this version.

dladams commented 7 months ago

Version 0.0.51 brings in some PRs from Ben:

9 updating ndcctools

10 adding the option to use WorkQueue in coprocess mode

11 add the option to disable parsl monitoring

A couple small test jobs ran successfully with these mods.

dladams commented 7 months ago

For version 0.0.52, add filesystem monitor.

dladams commented 7 months ago

For 0.0.53

dladams commented 7 months ago

For 0.0.54, disable parsl resource monitoring by default, i.e. default is pmon:0.

dladams commented 7 months ago

For 0.0.55,

The number of concurrent task chains is limited in g3wfpipe-run.py by calling get_future on a maximum of maxact final uncompleted tasks.

The number of concurrent running starting tasks is limited by adding a dependency on a parsl task which completes when a counter g3wfpipe-run.py exceeds a threshold.

dladams commented 6 months ago

Now at 0.0.55.dev93.

There is now a TP howfig tp:NTHR:MAXCST:MAXACT where

Run 1903 shows this is not working as intended. With MAXACT=50, g3wfpipe reports

Finished 0 of 2985 tasks. 2785 pending. 117 launched. 83 running.

in its first few monitoring lines and the recently updated monexp notebook show 117 running isr tasks at the start of then job. The time stamps in the isr stderr logs agree with the latter value, i.e. 117 tasks start in the first few seconds.

So two problems:

I resolved the first problem by starting only the required number of chains and dropped the artificial prerequisite task which was more complicated and not working.

I suspect the second problem is stale data, perhaps the monitoring DB is not being flushed or ParslGraph the ParslGraph cache is not up to date. I will not chase this down now. I created issue #12 to address this.

dladams commented 6 months ago

As shown in section 10 of the performance summary, the change in 0.0.55 did not speed up the processing. For 0.0.56, we try adding a delay between chain starts to desynchronize the chain starts. For more discussion and reults see the performance summary.