ReactionMechanismGenerator / ARC

ARC - Automatic Rate Calculator
https://reactionmechanismgenerator.github.io/ARC/index.html
MIT License
43 stars 21 forks source link

Receiving KeyError: '1' when attempting `arcrestart` #624

Open calvinp0 opened 1 year ago

calvinp0 commented 1 year ago

Describe the bug Fails to restart - reports this in the traceback

Traceback (most recent call last):
  File "/Local/ce_dana/Code/ARC//ARC.py", line 69, in <module>
    main()
  File "/Local/ce_dana/Code/ARC//ARC.py", line 65, in main
    arc_object.execute()
  File "/Local/ce_dana/Code/ARC/arc/main.py", line 583, in execute
    fine_only=self.fine_only,
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 484, in __init__
    self.schedule_jobs()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 500, in schedule_jobs
    self.run_conformer_jobs()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 1048, in run_conformer_jobs 
    self.process_conformers(label)
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 1748, in process_conformers 
    conformer=i,
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 831, in run_job
    self.save_restart_dict()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 3468, in save_restart_dict  
    + [self.job_dict[spc.label]['tsg'][get_i_from_job_name(job_name)].as_dict()
  File "/Local/ce_dana/Code/ARC/arc/scheduler.py", line 3467, in <listcomp>
    for job_name in self.running_jobs[spc.label] if 'conformer' in job_name] \
KeyError: 1

How to reproduce A 20 reaction list input file that I attempted to restart after the initial failed run

Additional context Maybe all these are related? #622 #623

Here is the restart file restart.zip

calvinp0 commented 1 year ago

Error is occurring for

Label: 'r_162_[CH2]c1ccccc1' self.job_dict[spc.label]: {'conformers': {0: <arc.job.adapters.ga...211622750>}}

In reality, there were 4 conformers completed. image

calvinp0 commented 1 year ago

Here is part of another restart.

  r_177_[CH]=CC=C:
  - args:
      block: {}
      keyword:
        general: scf=xqc
      trsh: {}
    conformer: 1
    constraints: []
    cpu_cores: 10
    ess_settings: *id005
    ess_trsh_methods:
    - restart_due_to_file_not_found
    execution_type: queue
    fine: false
    initial_time: '2023-03-25 12:15:46'
    job_adapter: gaussian
    job_id: '319902'
    job_memory_gb: 7.0
    job_name: conformer1
    job_num: 79
    job_server_name: a12079
    job_status:
    - running
    - error: ''
      keywords: []
      line: ''
      status: initializing
    job_type: conformers
    level:
      basis: def2svp
      compatible_ess: *id007
      method: wb97xd
      method_type: dft
      software: gaussian
    max_job_time: 120
    project: arc_ll_hab
    project_directory: /home/calvin/Code/arc_restart_debug/20_rows_171_to_190/
    server: local
    server_nodes: []
    species_labels:
    - r_177_[CH]=CC=C

In the debug, self.running_jobs[spc.label]: ['conformer0','conformer1']. So it appears from first glance that when reading in the yaml file, it is adding also conformer0 to the self.running_jobs[spc.label] dictionary

calvinp0 commented 1 year ago

Okay, so when the restart yaml is parsed, it correctly records which conformers are in the running_jobs (in this case conformer1) and also self.job_dict (self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{1:<arc.job.adapters>}}).

However, as the run restarts, it prints this out in the terminal:

Running local queue job conformer0 (a89) using gaussian for r_177_[CH]=CC=C

And thus, everything changes.

Now, self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{0:<arc.job.adapters>}} and self.running_jobs['r_177_[CH]=CC=C'] = ['conformer1', 'conformer0']. So it appears it isn't starting from 1, but rather from 0? And then removing the conformers: 1 from the self.job_dict but then appending to the self.running_jobs list the new conformer0. So when it does this line: https://github.com/ReactionMechanismGenerator/ARC/blob/main/arc/scheduler.py#L3470

It errors out because there is no conformer1 in the self.job_dict

alongd commented 1 year ago

The conformer counter should indeed start at 0. Need to check why you got self.job_dict['r_177_[CH]=CC=C'] = {'conformers':{1:<arc.job.adapters>}}