Closed dladams closed 11 months ago
For 0.0.2, scripts were copied from desc-prod and modified. It is now possible to run queries from the desc-prod server. E.g. "g3wfquery | w2318-visit:0:10000 | shift" reports "Done. 14.7: Found 36532 matching datasets.".
This package is not used by the server but the client who starts the job must install it so runapp-g3wfquery (and supporting scripts) can be found. The runapp script creates a run script which is run in either a cvmfs or shifter env. That run script installs the head of this package in the run directory so those envs don't need to constructed with thie frequently-changing package.
The server, client and job all run in different environments. The following table summarizes their package dependencies and when those packages are installed. The head at that time is typically used.
desc-prod | desc-gen3-prod | |
---|---|---|
server | Installed at server startup. | Not used. |
client | Installed by user. | Installed by user. |
job | Not used. | Installed automatically at run time. |
The script g3wf-convert-lsst-version allows wYYWW to be used in place of w_20YY_WW in configuration strings.
For 0.0.3, fix the code source configuration. In 0.0.2 all jobs are run with shifter. Change that cvmfs or shifter is used if those values are included in the howfig string, shifter is used by default and an error raised if any other value (like shift in the preceding example) is used. The change is pushed.
I have added a document describing the g3wfquery application and link to that from this package's README.
For 0.0.4, I copied the gen3workflow dockerfiles and supporting scripts in from desc-prod.
For 0.0.5, add missing docker build script and fix build ref in .gitignore.
Local .gitignore was also added.
For version 0.0.6, I bring over some my old gen3 wrapper code and begin to create application g3wfpipe to run pipeline jobs.
After many hours and fixes, I am apparently able to start a pipeline job. But it is out of control: 199 processes with 1-2% of the total CPU. I have to kill many processes to quiet this.
I try to fix the many process problem by setting a default wq memory limit of 2 GB in runapp-g3wfpipe.
Still 0.0.6.
For 0.0.7, I updated the g3wfpipe scripts so user can now specify init, proc or finalize to run those parts of the processing. Running a job with init or init-proc works. There is still much to do including:
For 0.0.8:
One simple test passes: job 232 was used to intialize (make QG) and job 244 to carry out the processing.
Likely future development is to create the second job from the first and submit it to batch.
For 0.0.8:
One simple test passes: job 232 was used to intialize (make QG) and job 244 to carry out the processing.
Likely future development is to create the second job from the first and submit it to batch.
For 0.0.9:
Here is the change in ingredients dir:
login32> find /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_21/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/ -name DRP.yaml | grep imSim /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_21/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/eups/Linux64/drp_pipe/gefb12affe0+f7c139c185/pipelines/_ingredients/LSSTCam-imSim/DRP.yaml login32> find /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_18/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/ -name DRP.yaml | grep imSim /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2023_18/conda/envs/lsst-scipipe-6.0.0-exact-ext/share/eups/Linux64/drp_pipe/g10915be422+36748ffaaa/ingredients/LSSTCam-imSim/DRP.yaml login32>vim job000252/job000252.log
For 0.0.10:
For 0.0.11:
ID | App | config | howfig |
---|---|---|---|
321 | g3wfpipe | w2321-visit:277-pipe:isr | cvmfs |
328 | g3wfpipe | job:321-init | tp:8 |
329 | g3wfpipe | job:328-proc |
For 0.0.12, trying to get g3wfpipe working with shifter.
I am now able to do g3wfpipe processing in a shifter image.
For 0.0.13, get timeout working
The last status message from a successful g3wfpipe job is "All steps completed.". It would be more useful to return the last message from the last step, e.g. "Finished 87 of 87 tasks." This is done on 0.0.14.
I also modified the status step to return a summary of all jobs in the job status. This required some effort because ParlsJob does not do this. I used the monitoring dataframe from ParlsJob and filtered out junk rows.
For 0.0.15, I played with catching interrupts in run-g3wfpipe.py so the report there would be up to date but eneded up removing all that code and removing the time out which is now better handled in the calling script.
Next I would like to the WorkQueue executor working. I have created a separate issue (#2) for that.
Later development is described in #2.
Version 0.0 of this package should enable DESCprod users to query the DC2 data and find how many raw datasets (CCD images) match a supplied query.