Open calvinp0 opened 1 year ago
Error is occurring for
Label: 'r_162_[CH2]c1ccccc1'
self.job_dict[spc.label]: {'conformers': {0: <arc.job.adapters.ga...211622750>}}
In reality, there were 4 conformers completed.
Here is part of another restart.
r_177_[CH]=CC=C:
- args:
block: {}
keyword:
general: scf=xqc
trsh: {}
conformer: 1
constraints: []
cpu_cores: 10
ess_settings: *id005
ess_trsh_methods:
- restart_due_to_file_not_found
execution_type: queue
fine: false
initial_time: '2023-03-25 12:15:46'
job_adapter: gaussian
job_id: '319902'
job_memory_gb: 7.0
job_name: conformer1
job_num: 79
job_server_name: a12079
job_status:
- running
- error: ''
keywords: []
line: ''
status: initializing
job_type: conformers
level:
basis: def2svp
compatible_ess: *id007
method: wb97xd
method_type: dft
software: gaussian
max_job_time: 120
project: arc_ll_hab
project_directory: /home/calvin/Code/arc_restart_debug/20_rows_171_to_190/
server: local
server_nodes: []
species_labels:
- r_177_[CH]=CC=C
In the debug, self.running_jobs[spc.label]: ['conformer0','conformer1']
. So it appears from first glance that when reading in the yaml file, it is adding also conformer0
to the self.running_jobs[spc.label]
dictionary
Okay, so when the restart yaml is parsed, it correctly records which conformers are in the running_jobs (in this case conformer1
) and also self.job_dict (self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{1:<arc.job.adapters>}}
).
However, as the run restarts, it prints this out in the terminal:
Running local queue job conformer0 (a89) using gaussian for r_177_[CH]=CC=C
And thus, everything changes.
Now, self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{0:<arc.job.adapters>}}
and self.running_jobs['r_177_[CH]=CC=C'] = ['conformer1', 'conformer0']
. So it appears it isn't starting from 1, but rather from 0? And then removing the conformers: 1 from the self.job_dict but then appending to the self.running_jobs list the new conformer0. So when it does this line:
https://github.com/ReactionMechanismGenerator/ARC/blob/main/arc/scheduler.py#L3470
It errors out because there is no conformer1 in the self.job_dict
The conformer counter should indeed start at 0. Need to check why you got self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{1:<arc.job.adapters>}}
Describe the bug Fails to restart - reports this in the traceback
How to reproduce A 20 reaction list input file that I attempted to restart after the initial failed run
Additional context Maybe all these are related? #622 #623
Here is the restart file restart.zip