ReactionMechanismGenerator / ARC

ARC - Automatic Rate Calculator
https://reactionmechanismgenerator.github.io/ARC/index.html
MIT License
43 stars 22 forks source link

ARC's troubleshooting: Orca's mdci error #769

Open NellyMitnik opened 2 weeks ago

NellyMitnik commented 2 weeks ago

Describe the bug For documentation purpose: While running opt job for OH specie in Orca as part of finding the frequency scaling factor project, I encountered the following error in Orce:


[file orca_mdci/mdci_state.cpp, line 1165, Process 2]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 3]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 4]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 5]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 6]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 7]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 8]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 9]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 10]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 11]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 12]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 13]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 14]: . . . aborting the run

[file orca_mdci/mdci_state.cpp, line 1165, Process 15]: . . . aborting the run

Error (ORCA_MDCI): Number of processes (16) in parallel calculation exceeds number of pairs (13)
[file orca_mdci/mdci_state.cpp, line 1165, Process 0]: . . . aborting the run

ORCA finished by error termination in MDCI
Calling Command: mpirun -np 16  /usr/local/orca-6.0.0/orca_mdci_mpi input.mdciinp.tmp input 
Check for MDCI-logfiles
[file orca_main/main_driver_opt1.cpp, line 1805]: ORCA finished with an error in the energy calculation

ARC's troubleshooting tries to rerun the job with the suggested number of "pairs" by Orca. In this case, 13 ncpus.

ARC Troubleshooting Orca MDCI error: ncpus

The job with 13 ncpus, also got the same error. This enters an endless loop of failed jobs.

I tried manually, changing the number of ncpus in the input and the submit script to 10. It worked and Orca converged successfully.

Suggestion: Maybe ARC's troubleshooting should consider a slightly lower number of ncpus than suggested by Orca's error.

kfir4444 commented 2 days ago

quick follow-up, an edge case of running [H] is done as follows:

"Error (ORCA_MDCI): Number of processes (20) in parallel calculation exceeds number of pairs (0)".
Troubleshooting sp job in orca for H using 0 cpu cores.

which is obviously none-sense