Closed doutriaux1 closed 3 years ago
@doutriaux1 -- What version of Maestro are you using?
maestro 1.1.7dev1
@FrankD412 I have the same version of the yaml but with more jobs to be run before that works great.
One thing is now I do not have any slurm jobs anymore.
C.
If you pulled recently, there was a shift in where expansion happens. Expansion no longer happens before Maestro exits, instead happening in the conductor. How many nodes does your graph have?
Also, what does the Maestro generated directory look like?
ll -rt generate_hohlraum_20200429-080926
total 60K
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 logs
-rw------- 1 cdoutrix aml_cs 3.4K Apr 29 08:09 maestro_custom_generator.py
drwx--S--- 3 cdoutrix aml_cs 4.0K Apr 29 08:09 meta
-rw------- 1 cdoutrix aml_cs 1.8K Apr 29 08:09 maestro_bug.yaml
-rw------- 1 cdoutrix aml_cs 25K Apr 29 08:09 generate_hohlraum.pkl
-rw------- 1 cdoutrix aml_cs 22 Apr 29 08:09 generate_hohlraum.txt
drwx--S--- 14 cdoutrix aml_cs 8.0K Apr 29 08:10 directory_permissions
drwx--S--- 3 cdoutrix aml_cs 4.0K Apr 29 08:10 kosh
My git log says:
git log -n1
commit 7022c4370cae8070e4632a423b78298782f3cabb
Author: Francesco Di Natale <dinatale3@llnl.gov>
Date: Tue Apr 14 11:28:38 2020 -0700
Bugfix for logging that didn't appear in submodules (#247)
* Improved logging setup.
* Transition to a LoggerUtil class.
* Addition of docstring to LoggerUtility + cleanup.
Ok, you're using a more recent version which means my previous comments apply. I did notice that you were missing some keys if you're intending on scheduling steps; you'll need extra keys in your steps. Based off of your parameter generator, I think this is what you're after:
- name: kosh
description: add simulation to kosh
run:
cmd: |
echo $(PROC).$(NODES).$(DOMAIN).$(RESOLUTION).$(RHO_FOAM2).$(LASPOWERMULT).$(MINIMALALE).$(PROCS_XENA).$(NODES_XENA)
export RUNDIR=HWH_$( basename $(WORKSPACE))
python $(SCRIPTS_DIR)/add_to_kosh.py --store=/usr/workspace/aml_cs/kosh/kosh_store.sql --root $(RUN_PATH) -n $RUNDIR
nodes: $(NODE)
procs: $(PROC)
walltime: <walltime for this step>
depends: []
Are there any exceptions at the end of the log file in logs
?
@FrankD412 keys should be generated by the custom generator, but you're right something is gone, the logg says:
ed?: False
2020-04-29 08:10:24,970 - maestrowf.interfaces.script.localscriptadapter:submit:153 - WARNING - Execution returned an error: /usr/WS1/aml_cs/ALE/LAGER/data-generation/Hohlraum/generate_hohlraum_20200429-080926/directory_permissions/domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06/directory_permissions_domain_288.laserPowerMult_1.0.minimalAle_1.
NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06.slurm.sh: line 7: run_PATH: command not found
chgrp: cannot access '/HWH_domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06': No such file or directory
Actually it's in the directory which are all generated, but only one kosh is generated:
ll -rt generate_hohlraum_20200429-080926/kosh
total 4.0K
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_18.laserPowerMult_1.0.minimalAle_0.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_2e-06
(kosh) [cdoutrix@rztopaz188:Hohlraum]$ ll -rt generate_hohlraum_20200429-080926/directory_permissions
total 48K
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_18.laserPowerMult_1.0.minimalAle_0.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_18.laserPowerMult_1.0.minimalAle_0.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_72.laserPowerMult_1.0.minimalAle_0.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_72.laserPowerMult_1.0.minimalAle_0.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_288.laserPowerMult_1.0.minimalAle_0.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_288.laserPowerMult_1.0.minimalAle_0.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_18.laserPowerMult_1.0.minimalAle_1.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_18.laserPowerMult_1.0.minimalAle_1.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_72.laserPowerMult_1.0.minimalAle_1.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_72.laserPowerMult_1.0.minimalAle_1.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_0.35
forget the error in the log I have a typo: $(run_PATH) instead of $(RUN_PATH) but that doesn't expalin why only one kosh is generated. I'll keep looking in the log
@doutriaux1 -- You defined the variable as RUN_PATH
-- it seems that you're using it as run_PATH
.
In your directory_permissions
step change chgrp -R aml_cs $(run_PATH)/$RUNDIR
to chgrp -R aml_cs $(RUN_PATH)/$RUNDIR
the log seems to indicate it's expanding ok:
==================================================
Expanding step 'kosh'
==================================================
-------- Used Parameters --------
{'LASPOWERMULT', 'NODES', 'RESOLUTION', 'DOMAIN', 'NODES_XENA', 'PROCS_XENA', 'PROC', 'RHO_FOAM2', 'MINIMALALE'}
---------------------------------
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:616 - INFO -
**********************************
Combo [laserPowerMult_1.0.minimalAle_0.meshResolution_0.15.domain_18.foamDensity_2e-06.PROC_72.NODES_2.PROCS_XENA_18.NODES_XENA_1]
**********************************
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:645 - INFO - Searching for workspaces...
cmd = echo 72.2.18.0.15.2e-06.1.0.0.18.1
export RUNDIR=HWH_$( basename $(WORKSPACE))
python /usr/workspace/aml_cs/ALE/LAGER/data-generation/Hohlraum/add_to_kosh.py --store=/usr/workspace/aml_cs/kosh/kosh_store.sql --root /usr/workspace/aml_cs/ALE/Hohlraum -n $RUNDIR
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:676 - INFO - New cmd = echo 72.2.18.0.15.2e-06.1.0.0.18.1
export RUNDIR=HWH_$( basename $(WORKSPACE))
python /usr/workspace/aml_cs/ALE/LAGER/data-generation/Hohlraum/add_to_kosh.py --store=/usr/workspace/aml_cs/kosh/kosh_store.sql --root /usr/workspace/aml_cs/ALE/Hohlraum -n $RUNDIR
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:616 - INFO -
**********************************
Combo [laserPowerMult_1.0.minimalAle_0.meshResolution_0.15.domain_18.foamDensity_0.35.PROC_72.NODES_2.PROCS_XENA_18.NODES_XENA_1]
**********************************
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:645 - INFO - Searching for workspaces...
cmd = echo 72.2.18.0.15.0.35.1.0.0.18.1
export RUNDIR=HWH_$( basename $(WORKSPACE))
python /usr/workspace/aml_cs/ALE/LAGER/data-generation/Hohlraum/add_to_kosh.py --store=/usr/workspace/aml_cs/kosh/kosh_store.sql --root /usr/workspace
@FrankD412 changing to RUN_PATH does not seem to make a difference
@FrankD412 even if it fails the status should at least indicate what has run/intialiazed/failed etc. No?
@doutriaux1 -- It should. I'll have to sit down with the sample or schedule a meeting with you to dive deeper. There isn't anything blatant that I'm seeing that's wrong here, let me mess with it on my end and I'll see what I can find.
Ok thanks. It's really odd since the full one (with slurm jobs) works fine.
@FrankD412 if that "helps" things get worse with the repo's head:
maestro run -p maestro_custom_generator.py maestro_bug.yaml
[2020-04-29 08:46:30: INFO] INFO Logging Level -- Enabled
[2020-04-29 08:46:30: WARNING] WARNING Logging Level -- Enabled
[2020-04-29 08:46:30: CRITICAL] CRITICAL Logging Level -- Enabled
[2020-04-29 08:46:30: INFO] Loading specification -- path = maestro_bug.yaml
[2020-04-29 08:46:30: ERROR] ('variables',)
Traceback (most recent call last):
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 112, in load_specification
specification = cls.load_specification_from_stream(data)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 155, in load_specification_from_stream
specification.verify()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 162, in verify
self.verify_environment()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 280, in verify_environment
keys_seen = self._verify_variables()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 200, in _verify_variables
for key, value in self.environment["variables"].items():
KeyError: 'variables'
Traceback (most recent call last):
File "/g/g19/cdoutrix/miniconda3/envs/kosh/bin/maestro", line 11, in <module>
load_entry_point('maestrowf==1.1.7.dev1', 'console_scripts', 'maestro')()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/maestro.py", line 424, in main
rc = args.func(args)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/maestro.py", line 130, in run_study
spec = YAMLSpecification.load_specification(args.specification)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 116, in load_specification
raise e
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 112, in load_specification
specification = cls.load_specification_from_stream(data)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 155, in load_specification_from_stream
specification.verify()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 162, in verify
self.verify_environment()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 280, in verify_environment
keys_seen = self._verify_variables()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 200, in _verify_variables
for key, value in self.environment["variables"].items():
KeyError: 'variables'
(kosh) [cdoutrix@rztopaz188:Hohlraum]$
Oh, that looks like it may be a bug with the last release. Can you file that in a separate issue? That's a new feature that was added about a week ago.
I do have a curiosity question related: It looks like you're putting statically defined items in labels
-- those should probably go in variables
. Is there a reason why you prefer the labels
section?
@FrankD412 not really, I probably just copy/pasted from another example. And maybe because these do not "vary".
Got it -- was just curious if there was a use case for it that I should be supporting. Thanks for the info.
@FrankD412 if that "helps" things get worse with the repo's head:
maestro run -p maestro_custom_generator.py maestro_bug.yaml [2020-04-29 08:46:30: INFO] INFO Logging Level -- Enabled [2020-04-29 08:46:30: WARNING] WARNING Logging Level -- Enabled [2020-04-29 08:46:30: CRITICAL] CRITICAL Logging Level -- Enabled [2020-04-29 08:46:30: INFO] Loading specification -- path = maestro_bug.yaml [2020-04-29 08:46:30: ERROR] ('variables',) Traceback (most recent call last): File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 112, in load_specification specification = cls.load_specification_from_stream(data) File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 155, in load_specification_from_stream specification.verify() File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 162, in verify self.verify_environment() File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 280, in verify_environment keys_seen = self._verify_variables() File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 200, in _verify_variables for key, value in self.environment["variables"].items(): KeyError: 'variables' Traceback (most recent call last): File "/g/g19/cdoutrix/miniconda3/envs/kosh/bin/maestro", line 11, in <module> load_entry_point('maestrowf==1.1.7.dev1', 'console_scripts', 'maestro')() File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/maestro.py", line 424, in main rc = args.func(args) File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/maestro.py", line 130, in run_study spec = YAMLSpecification.load_specification(args.specification) File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 116, in load_specification raise e File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 112, in load_specification specification = cls.load_specification_from_stream(data) File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 155, in load_specification_from_stream specification.verify() File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 162, in verify self.verify_environment() File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 280, in verify_environment keys_seen = self._verify_variables() File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 200, in _verify_variables for key, value in self.environment["variables"].items(): KeyError: 'variables' (kosh) [cdoutrix@rztopaz188:Hohlraum]$
I just created the new issue, didn't realize I could just create it off the comment. Just an FYI not to worry about making a new one.
@doutriaux1 -- @ben-bay just fixed the variable
section bug. I've got the example expanding on my own machine. Will be looking at this shortly.
Here is the yaml I'm trying to run. It expands correctly and when I they "y" to launch the study everything look correct. But it appears nothing is launched a meastro status does not even show anything
Launch output:
status (nothing):
custom generator: