LightForm-group / UoM-CSF-matflow

Matflow configuration files and task schemas for running on the University of Manchester's CSF
MIT License
8 stars 0 forks source link

No Simulation Run for Repeat 30 in 100 repeat DAMASK simulation #9

Closed ElliotCN closed 3 years ago

ElliotCN commented 3 years ago

Hi All,

Describe the problem I am running a Damask loading simulation using Task: generate_load_case method: random_2D to generate the load cases. I want to run 100 repeats of the Damask loading simulation so I can plot a yield surface. The problem is that for some reason the 30th loading simulation doesn't work and all the others after don't either. When I look in the Task: generate_load_case files for the 30th repeat there is only a inputs.hdf5 where as all the other generate_load_case repeats have both inputs.hdf5 and outputs.hdf5.

Expected behavior All repeats to run. Not sure why it has a problem with 30 seems very strange to me.

Output from matflow validate Please paste here the output from running the command matflow-validate.

Workflow directory and/or profile location /mnt/eps01-rds/Fonseca-Lightform/shared/matflow-debugging/fit_yield_function_Brass_Texture_2D_test_2021-02-02-161612

Thanks for any help!

Elliot

aplowman commented 3 years ago

This is reproducible with just the load case task:

name: test_load_reps
archive: dropbox
tasks:
  - name: generate_load_case
    method: random_2D
    software: formable
    context: multiaxial
    base:
      total_times: [100]
      num_increments: [400]
      target_strains: [1.0e-1]
      normal_directions: [z]
    repeats: 100
    groups:
      multiaxial_responses:
        group_by: []
        nest: True

Looking in the .hpcflow logs, it appears to be a database locking issue associated with the hpcflow.api.set_task_end:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: UPDATE task SET end_time=? WHERE task.id = ?]
[parameters: ('2021-02-04 12:19:41.555813', 33)]

This is likely due to the large number of tasks finishing at the same time. This should be fixable in hpcflow by introducing some more exception handling and waits.

aplowman commented 3 years ago

An unrelated issue is in the sample_texture task of your workflow, where the correct ODF component keys are modal_orientation_HKL and modal_orientation_UVW (capitalised HKL and UVW). See: https://github.com/LightForm-group/UoM-CSF-matflow/blob/d75cdc2687686bb9660c908f620ef4ba625ab3d3/task_examples/sample_texture.yml#L38.

ElliotCN commented 3 years ago

How can edit the hpcflow is it not generated by matflow from the .yml file?

aplowman commented 3 years ago

How can edit the hpcflow is it not generated by matflow from the .yml file?

Yes, that's right. To be clear, this problem is a bug in hpcflow. I will fix it.

ElliotCN commented 3 years ago

Thanks Adam!

aplowman commented 3 years ago

This was actually caused by a different bug in hpcflow. I think I have fixed both of them now. Could you update MatFlow to v0.2.16 with pip install --user -U matflow and then try your workflow again? (Updating MatFlow should also update hpcflow to v0.1.14).

ElliotCN commented 3 years ago

Yep works now thanks Adam!