Open gcapes opened 3 months ago
Hi @YuchenZZheng
I've just run the example_problem.yaml
workflow that you send me, and haven't found any errors. Should I get errors, or would I need to inspect the output files to realise the simulation has aborted? I might just not be looking in the right place.
I suppose you can use 'matflow-dev show -f' command to see if there's a error, or go to the "execute/task_2_simulate_VE_loading_damask/e_0/r_0/stderr.log" of the output folder.
Perfect, thanks! Not sure why I didn't see that earlier :man_shrugging:
With the caveat that I've really no idea what I'm looking at 😄 , it seems that the damask_post_processing
step modifies an hdf5 file, but by default doesn't save it (in the artifacts directory, only in the execute directory), but uses the modified hdf5 file to do subsequent post-processing and plotting because they're in the same task.
The only hdf5 files I've found are
./execute/task_2_simulate_VE_loading_damask/e_0/r_0/geom_load.hdf5
./execute/task_4_simulate_VE_loading_damask_2/e_0/r_0/geom_load.hdf5
and I think they're created by <<script:damask/write_geom.py>>
in the simulate_VE_loading_damask
task schema.
I'm not sure, but it might be that in order to access this in the next task, it needs to be saved. I think your second loading task is using the geom.vti
file as the input. Given I don't really understand whether a VE_response
output is the same thing as a volume_element
input, you might have some success changing the default save_files: false
to save_files: true
on whichever of the output file parsers is creating the input file you need for the next task.
Hi @YuchenZZheng, Did you get this sorted in the end?
Yes, I did. Thank you for the help.
Would you be able to explain the fix?
Sorry, I thought we are talking about getting the latest version of MatFlow work. To be honest, the new version didn't solve any of my previous problem.
I've just tried to run this example again with my newly installed matflow-full-env and matflow version on CSF3, and get this error now:
$ matflow go example_problem.yaml
/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/paramiko/pkey.py:100: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed
from this module in 48.0.0.
"cipher": algorithms.TripleDES,
/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/paramiko/transport.py:259: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be
removed from this module in 48.0.0.
"class": algorithms.TripleDES,
ERROR matflow.persistence: batch update exception!
Traceback (most recent call last):
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/bin/matflow", line 8, in <module>
sys.exit(cli())
^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/cli.py", line 161, in make_and_submit_workflow
out = app.make_and_submit_workflow(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/app.py", line 280, in <lambda>
return lambda *args, **kwargs: func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/app.py", line 1403, in _make_and_submit_workflow
submitted_js = wk.submit(
^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/core/workflow.py", line 2330, in submit
exceptions, submitted_js = self._submit(
^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/log.py", line 25, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/core/workflow.py", line 2237, in _submit
new_sub = self._add_submission(tasks=tasks, JS_parallelism=JS_parallelism)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/log.py", line 25, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/core/workflow.py", line 2590, in _add_submission
jobscripts=self.resolve_jobscripts(tasks),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/log.py", line 25, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/core/workflow.py", line 2620, in resolve_jobscripts
js, element_deps = self._resolve_singular_jobscripts(tasks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/log.py", line 25, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/core/workflow.py", line 2663, in _resolve_singular_jobscripts
res, res_hash, res_map, EAR_map = generate_EAR_resource_map(task, loop_idx_i)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/log.py", line 25, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/submission/jobscript.py", line 59, in generate_EAR_resource_map
res_hash = run.resources.get_jobscript_hash()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/core/element.py", line 253, in get_jobscript_hash
dct["scheduler_args"]["options"] = _hash_dict(scheduler_args["options"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/iusers01/support/mbexegc2/yuchen-zheng/.venv/lib/python3.11/site-packages/hpcflow/sdk/core/element.py", line 241, in _hash_dict
keys, vals = zip(*d.items())
^^^^^^^
AttributeError: 'list' object has no attribute 'items'
Hi Gerard, it might because of the format of resource block. Please try it to:
resources:
any:
scheduler: sge
scheduler_args:
shebang_args: --login
options:
-l: short
Thanks Yuchen - it's now running.
Ok, so I get an error in the output from the simulate_VE_loading_damask
task, which I'll look at when I'm at home (I've saved a copy of the workflow directory to look at later).
I've sent the output to Adam to get his thoughts.
This looks like a damask error rather than a matflow error. Might be best to ask Joao for input?
I have lost track of what error this is. Is it this: https://github.com/LightForm-group/non-repo-issues/issues/21#issuecomment-2247535241
No, it's this in the stderr.log
file
INFO: Detected Singularity user configuration directory
┌─────────────────────────────────────────────────────────────────────┐
┌─────────────────────────────────────────────────────────────────────┐
│ error │
│ 950 │
├─────────────────────────────────────────────────────────────────────┤
│ max number of cut back exceeded, terminating │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ error │
│ 950 │
├─────────────────────────────────────────────────────────────────────┤
│ max number of cut back exceeded, terminating │
│ │
└─────────────────────────────────────────────────────────────────────┘
│ error │
│ 950 │
├─────────────────────────────────────────────────────────────────────┤
│ max number of cut back exceeded, terminating │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ error │
│ 950 │
├─────────────────────────────────────────────────────────────────────┤
│ max number of cut back exceeded, terminating │
│ │
└─────────────────────────────────────────────────────────────────────┘
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL IEEE_INEXACT_FLAG
STOP 1
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL IEEE_INEXACT_FLAG
STOP 1
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_OVERFLOW_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL IEEE_INEXACT_FLAG
STOP 1
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL IEEE_INEXACT_FLAG
STOP 1
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[51711,1],2]
Exit code: 1
--------------------------------------------------------------------------
Yuchen is already using Matflow on the CSF and is simulating the compression of Al crystals. When the volume elements are compressed by 60%, there is an error (which others get too) which results from the deformed shape of the volume element. Remeshing is a potential solution. So the proposal is to apply the load incrementally, but when he tries this, the task simply repeats the same simulation instead of continuing where it left off. So this looks like a Matflow problem with how to use the output from the previous step to continue the simulation.
Actions: