LSSTDESC / desc-gen3-prod

Desc-prod wrapper for pipeline production using gen3_workflow.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Version 0.0 #1

Closed dladams closed 11 months ago

dladams commented 1 year ago

Version 0.0 of this package should enable DESCprod users to query the DC2 data and find how many raw datasets (CCD images) match a supplied query.

dladams commented 1 year ago

For 0.0.2, scripts were copied from desc-prod and modified. It is now possible to run queries from the desc-prod server. E.g. "g3wfquery | w2318-visit:0:10000 | shift" reports "Done. 14.7: Found 36532 matching datasets.".

This package is not used by the server but the client who starts the job must install it so runapp-g3wfquery (and supporting scripts) can be found. The runapp script creates a run script which is run in either a cvmfs or shifter env. That run script installs the head of this package in the run directory so those envs don't need to constructed with thie frequently-changing package.

The server, client and job all run in different environments. The following table summarizes their package dependencies and when those packages are installed. The head at that time is typically used.

desc-prod desc-gen3-prod
server Installed at server startup. Not used.
client Installed by user. Installed by user.
job Not used. Installed automatically at run time.

The script g3wf-convert-lsst-version allows wYYWW to be used in place of w_20YY_WW in configuration strings.

dladams commented 1 year ago

For 0.0.3, fix the code source configuration. In 0.0.2 all jobs are run with shifter. Change that cvmfs or shifter is used if those values are included in the howfig string, shifter is used by default and an error raised if any other value (like shift in the preceding example) is used. The change is pushed.

dladams commented 1 year ago

I have added a document describing the g3wfquery application and link to that from this package's README.

dladams commented 1 year ago

For 0.0.4, I copied the gen3workflow dockerfiles and supporting scripts in from desc-prod.

dladams commented 1 year ago

For 0.0.5, add missing docker build script and fix build ref in .gitignore.

Local .gitignore was also added.

dladams commented 1 year ago

For version 0.0.6, I bring over some my old gen3 wrapper code and begin to create application g3wfpipe to run pipeline jobs.

After many hours and fixes, I am apparently able to start a pipeline job. But it is out of control: 199 processes with 1-2% of the total CPU. I have to kill many processes to quiet this.

dladams commented 1 year ago

I try to fix the many process problem by setting a default wq memory limit of 2 GB in runapp-g3wfpipe.

dladams commented 1 year ago

Still 0.0.6.

dladams commented 1 year ago

For 0.0.7, I updated the g3wfpipe scripts so user can now specify init, proc or finalize to run those parts of the processing. Running a job with init or init-proc works. There is still much to do including:

dladams commented 1 year ago

For 0.0.8:

One simple test passes: job 232 was used to intialize (make QG) and job 244 to carry out the processing.

Likely future development is to create the second job from the first and submit it to batch.

dladams commented 1 year ago

For 0.0.8:

One simple test passes: job 232 was used to intialize (make QG) and job 244 to carry out the processing.

Likely future development is to create the second job from the first and submit it to batch.

dladams commented 1 year ago

For 0.0.9:

Here is the change in ingredients dir:

 login32> find /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_21/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/ -name DRP.yaml | grep imSim
/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_21/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/eups/Linux64/drp_pipe/gefb12affe0+f7c139c185/pipelines/_ingredients/LSSTCam-imSim/DRP.yaml
 login32> find /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_18/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/ -name DRP.yaml | grep imSim
/cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_18/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/eups/Linux64/drp_pipe/g10915be422+36748ffaaa/ingredients/LSSTCam-imSim/DRP.yaml
 login32>vim job000252/job000252.log 
dladams commented 1 year ago

For 0.0.10:

dladams commented 1 year ago

For 0.0.11:

ID App config howfig
321 g3wfpipe w2321-visit:277-pipe:isr cvmfs
328 g3wfpipe job:321-init tp:8
329 g3wfpipe job:328-proc
dladams commented 1 year ago

For 0.0.12, trying to get g3wfpipe working with shifter.

I am now able to do g3wfpipe processing in a shifter image.

dladams commented 1 year ago

For 0.0.13, get timeout working

dladams commented 1 year ago

The last status message from a successful g3wfpipe job is "All steps completed.". It would be more useful to return the last message from the last step, e.g. "Finished 87 of 87 tasks." This is done on 0.0.14.

I also modified the status step to return a summary of all jobs in the job status. This required some effort because ParlsJob does not do this. I used the monitoring dataframe from ParlsJob and filtered out junk rows.

dladams commented 1 year ago

For 0.0.15, I played with catching interrupts in run-g3wfpipe.py so the report there would be up to date but eneded up removing all that code and removing the time out which is now better handled in the calling script.

dladams commented 1 year ago

Next I would like to the WorkQueue executor working. I have created a separate issue (#2) for that.

dladams commented 11 months ago

Later development is described in #2.