Closed calvinp0 closed 1 year ago
[self.job_dict[spc.label]['tsg'][get_i_from_job_name(job_name)].as_dict()
for job_name in self.running_jobs[spc.label] if 'tsg' in job_name]
In this line of code, when get_i_from_job_name(job_name)
is returning 0
, which appears to be expected.
However, self.job_dict[spc.label]['tsg']
returns this:
{'tsg0': <arc.job.adapters.ts...4ac054890>, 1: <arc.job.adapters.ts...4ac0549d0>}
As we can see, it has kept 'tsg' in the key.
Going back further for self.job_dict[spc.label]
, we get:
{'tsg': {'tsg0': <arc.job.adapters.ts...4ac054890>, 1: <arc.job.adapters.ts...4ac0549d0>}}
Good catch! Can you see where self.job_dict[spc.label]
is initialized for transition state guesses (tsg)?
Okay, so moving far enough back, I have found the issue:
In the restart file, here is an example of TS
running_jobs:
TS0:
- args:
block: {}
keyword: {}
trsh: {}
constraints: []
cpu_cores: 8
ess_settings: *id002
execution_type: incore
initial_time: '2023-03-25 12:27:55.256610'
job_adapter: heuristics
job_id: 12224
job_memory_gb: 7.0
job_name: tsg0
job_num: 12224
job_server_name: a12224
job_status:
- done
- error: ''
keywords: []
line: ''
status: done
job_type: tsg
level: null
max_job_time: 120
project: arc_ll_hab
project_directory: /storage/ce_dana/calvinp/runs/nn_arc/low_level/20_rows_171_to_190
reaction_indices:
- 0
- args:
block: {}
keyword: {}
trsh: {}
constraints: []
cpu_cores: 8
ess_settings: *id002
execution_type: incore
initial_time: '2023-03-25 12:28:04.085632'
job_adapter: autotst
job_id: 12225
job_memory_gb: 7.0
job_name: tsg1
job_num: 12225
job_server_name: a12225
job_status:
- done
- error: ''
keywords: []
line: ''
status: done
job_type: tsg
level: null
max_job_time: 120
project: arc_ll_hab
project_directory: /storage/ce_dana/calvinp/runs/nn_arc/low_level/20_rows_171_to_190
reaction_indices:
- 0
tsg: 1
Notice that for tsg1
it has tsg: 1
as a key and value, BUT in tsg0
, it has no such key or value.
When it has tsg=0
it will interpret it as false, thus not meeting the condition
Describe the bug
How to reproduce This is a 20 reaction list I am using and had to restart after an issue in the initial run. When attempting to
arcrestart
, I receive this error.Additional Context Uncertain if this is also related to this: #622?
Here is the restart.yml
restart.zip