ReactionMechanismGenerator / ARC

ARC - Automatic Rate Calculator
https://reactionmechanismgenerator.github.io/ARC/index.html
MIT License
42 stars 21 forks source link

How to use ARC only on PC without clustering software? #742

Open OJ-0908 opened 1 month ago

OJ-0908 commented 1 month ago

As I'm not from a computer science background, I find there's a certain threshold in my understanding of server installation and software scheduling. I'd like to ask about the settings in ARC: how can I configure them to run on a personal computer? We don't need to calculate reactions with large molecular weights, so the computational requirements are not very high. Could you provide the files for installing ARC on a single computer or a video tutorial for installing ARC on a cluster? Looking forward to your reply!

alongd commented 1 month ago

Hi OJ, ARC schedules electronic structure jobs using Gaussian/Orca/Molpro/Q_chem etc. Normally, these software packages are installed on a server, though installing them on a PC is possible as well. If your organization/university has a cluster server, check whether it has any of the above software there, then ARC could be configured to run jobs and utilize them. ARC could be installed either on a server or on a local PC (SSHing into a the server). Let us know if you encounter specific issues with the installation instructions or on which stage you got stuck.

OJ-0908 commented 1 month ago

[Hello] alongd! I'm truly delighted to receive your response. I have successfully installed all modules of ARC and the quantum computing software Gaussian16 on my computer. However, I encountered some issues while trying to run a basic example provided by ARC (located at ARC/examples/minima). Here are the details of the error:

(arc_env) ➜ minimal arc /rmg/RMG-Py/rmgpy/rmg/reactors.py:52: RuntimeWarning: Unable to import Julia dependencies, original error: No module named 'julia' warnings.warn("Unable to import Julia dependencies, original error: " + str(e), RuntimeWarning) /miniconda/envs/arc_env/lib/python3.7/site-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, ARC execution initiated on Tue May 14 14:22:25 2024 ###############################################################

Automatic Rate Calculator

ARC

Version: 1.1.0

############################################################### The current git HEAD for RMG-Py is: 036ab3f8ca0f94f567b50b0b83110bab0a14a35f Wed Aug 16 15:56:48 2023 -0400 The current git HEAD for RMG-database is: b7ff16364a07c9a51a34303aa28407a83455a3e4 Tue Aug 8 09:37:08 2023 -0400 Starting project minimal Using the following ESS settings: {'cfour': ['local'], 'gaussian': ['local', 'server2'], 'gcn': ['local'], 'molpro': ['local', 'server2'], 'onedmin': ['server1'], 'openbabel': ['local'], 'orca': ['local'], 'qchem': ['server1'], 'terachem': ['server1'], 'torchani': ['local'], 'xtb': ['local'], 'xtb_gsm': ['local']} Using the following levels of theory: Conformers: (default) wb97xd/def2svp, software: gaussian (dft) Geometry optimization: (default) wb97xd/def2tzvp Frequencies: (user-defined opt) wb97xd/def2tzvp, software: gaussian (dft) Energy: (default) ccsd(t)-f12/cc-pvtz-f12 Rotor scans: (user-defined opt) wb97xd/def2tzvp, software: gaussian (dft) IRC: (default) wb97xd/def2tzvp Considering species: H2 <Molecule "[H][H]"> Starting (non-TS) species conformational analysis... Only one conformer is available for species H2, using it as initial xyz. The only conformer for species H2 was found to be isomorphic with the 2D graph representation [H][H] Running local queue job opt_a5 using gaussian for H2 Warning: Did not find the output file of job opt_a5 with path /root/workspace/ARC/examples/minimal/calcs/Species/H2/opt_a5/output.out. Maybe the job never ran. Re-running job. Running local queue job opt_a6 using gaussian for H2 Warning: Job opt_a6 errored because for the second time ARC did not find the output file path /root/workspace/ARC/examples/minimal/calcs/Species/H2/opt_a6/output.out. Error: Species H2 did not converge. Job type status is: {'sp': False, 'fine': False, 'conformers': True, 'freq': False, 'rotors': True, 'opt': False} All jobs terminated. Summary for project minimal: Species H2 failed with status: {'sp': False, 'fine': False, 'conformers': True, 'irc': False, 'freq': False, 'rotors': True, 'opt': False} single conformer passed isomorphism check; Total execution time: 00:02:21 ARC execution terminated on Tue May 14 14:24:46 2024

From the results, it seems that while running the ARC module, it does not invoke the locally installed Gaussian16 software, which leads to errors due to the absence of expected Gaussian computation results. In reality, I have successfully installed Gaussian16 on my computer and it can operate independently. The settings for its operational environment are as follows: _export g16root=$HOME/workspace/gaussian export GAUSS_EXEDIR=$g16root/g16 export GAUSSSCRDIR=$g16root/scr

According to the ARC software's guidance documents, it seems necessary to modify the settings and submit files if not using cluster scheduling software. However, due to my limited knowledge of programming languages and computer software, I am unsure how to edit these files. My goal is to successfully run ARC using only my local computer. I've attempted to modify a file, but unfortunately, the attempt failed. I would greatly appreciate your expert guidance and help—thank you very much!

settings.zip

calvinp0 commented 1 month ago

Hi @OJ-0908 ,

Firstly, I want to make sure, is your settings.py and submit.py located in ~/.arc folder? I realise in the installation instructions you can just modify both files in the code folder but it appears to me from you arc.log it's pick up the original settings.py file as you can see by this comment in the log file

Using the following ESS settings:
{'cfour': ['local'],
'gaussian': ['local', 'server2'],
'gcn': ['local'],
'molpro': ['local', 'server2'],
'onedmin': ['server1'],
'openbabel': ['local'],
'orca': ['local'],
'qchem': ['server1'],
'terachem': ['server1'],
'torchani': ['local'],
'xtb': ['local'],
'xtb_gsm': ['local']}

Secondly, we do not utilise G16 as you do on a local pc without a clustering software so I fear that the current code in ARC is only developed for clustering software installed.

This is an example of my current settings.py for one of our servers:

servers = {
    'local': {  # Each Zeus node containes 80 cores and 378 GB RAM
        'cluster_soft': 'PBS',
        'un': 'calvin.p',
        'queues' : {'zeus_combined_q': '24:00:0',
                    'zeus_new_q' : '72:00:00',
                    'zeus_long_q' : '168:00:00',
                   'mafat_new_q': '3600:00:00'},
        'excluded_queues': ['vkm_gm_q'],
        'cpus': 16,  # 20
        'memory': 320,  # 360 / 4.0
        'max_jobs':30,
        'qstat_command': "qstat -x -u $USER | grep $USER | grep ' R\| Q' | wc -l"
    },
}

global_ess_settings = {
    'gaussian': ['local'],
    'molpro': ['local'],
    'orca': ['local']
}

supported_ess = ['gaussian', 'molpro',
'orca']

# TS methods to try when appropriate for a reaction (other than user guesses which are always allowed):
ts_adapters = ['heuristics', 'AutoTST', 'GCN', 'KinBot']

default_job_settings = {
    'job_total_memory_gb': 32,
    'job_cpu_cores': 16,
}

check_status_command = {'OGE': 'export SGE_ROOT=/opt/sge; /opt/sge/bin/lx24-amd64/qstat',
                        'Slurm': '/usr/bin/squeue',
                        'PBS': '/opt/pbs/bin/qstat',
                        }

submit_command = {'OGE': 'export SGE_ROOT=/opt/sge; /opt/sge/bin/lx24-amd64/qsub',
                  'Slurm': '/usr/bin/sbatch',
                  'PBS': '/opt/pbs/bin/qsub',
                  }

delete_command = {'OGE': 'export SGE_ROOT=/opt/sge; /opt/sge/bin/lx24-amd64/qdel',
                  'Slurm': '/usr/bin/scancel',
                  'PBS': '/opt/pbs/bin/qdel',
                  }

list_available_nodes_command = {'OGE': 'export SGE_ROOT=/opt/sge; /opt/sge/bin/lx24-amd64/qstat -f | grep "/8 " | grep "long" | grep -v "8/8"| grep -v "aAu"',
                                'Slurm': 'sinfo -o "%n %t %O %E"',
                                'PBS': '/opt/pbs/bin/pbsnodes',
                                }

And this is my submit.py in regards to G09 and a PBS clustering software

submit_scripts = {
    'local': {
        'gaussian': """#!/bin/bash -l

#PBS -q {queue}
#PBS -N {name}
#PBS -l select=1:ncpus={cpus}:mem={memory}:mpiprocs={cpus}
#PBS -o out.txt
#PBS -e err.txt

# Echo statements will output to a debug file in PBS_O_WORKDIR
DEBUG_FILE="$PBS_O_WORKDIR/debug_log.txt"

echo "Changing to PBS_O_WORKDIR: $PBS_O_WORKDIR" >> "$DEBUG_FILE"
cd "$PBS_O_WORKDIR"

echo "Sourcing Gaussian setup script" >> "$DEBUG_FILE"
source /usr/local/g09/setup.sh

GAUSS_SCRDIR="/gtmp/{un}/scratch/g09/$PBS_JOBID"
echo "Creating Gaussian scratch directory: $GAUSS_SCRDIR" >> "$DEBUG_FILE"
mkdir -p "$GAUSS_SCRDIR"

echo "Exporting GAUSS_SCRDIR" >> "$DEBUG_FILE"
export GAUSS_SCRDIR="$GAUSS_SCRDIR"

echo "Touching initial_time" >> "$DEBUG_FILE"
touch initial_time

echo "Changing directory to GAUSS_SCRDIR" >> "$DEBUG_FILE"
cd "$GAUSS_SCRDIR"

echo "Copying input.gjf to GAUSS_SCRDIR" >> "$DEBUG_FILE"
cp "$PBS_O_WORKDIR/input.gjf" "$GAUSS_SCRDIR"

echo "Checking for check.chk file" >> "$DEBUG_FILE"
if [ -f "$PBS_O_WORKDIR/check.chk" ]; then
    echo "Copying check.chk to GAUSS_SCRDIR" >> "$DEBUG_FILE"
    cp "$PBS_O_WORKDIR/check.chk" "$GAUSS_SCRDIR/"
fi

echo "Running Gaussian" >> "$DEBUG_FILE"
g09 < input.gjf > input.log

echo "Copying input.* back to PBS_O_WORKDIR" >> "$DEBUG_FILE"
cp input.* "$PBS_O_WORKDIR/"

echo "Checking for check.* files" >> "$DEBUG_FILE"
if [ -f check.* ]; then
    echo "Copying check.* back to PBS_O_WORKDIR" >> "$DEBUG_FILE"
    cp check.* "$PBS_O_WORKDIR/"
fi

echo "Removing GAUSS_SCRDIR" >> "$DEBUG_FILE"
rm -vrf "$GAUSS_SCRDIR"

echo "Changing back to PBS_O_WORKDIR" >> "$DEBUG_FILE"
cd "$PBS_O_WORKDIR"

echo "Touching final_time" >> "$DEBUG_FILE"
touch final_time

        """,}

And, as you can see in the code, for example here, submitting a job requires cluster software to be defined. Additionally, the software relies on cluster software recognition to determine if a job has been run/completed/failed.

Running ESS on a local computer without cluster software is something we are looking to implement in the future but for now I have only done a bit of development on the idea. You can see there is a branch here that is geared towards using pyscf instead of Gaussian: https://github.com/ReactionMechanismGenerator/ARC/tree/pyscf

OJ-0908 commented 1 month ago

Hi @Calvin,

I'm thrilled to receive your reply, and I feel extremely happy and fortunate. Regarding your first response, it's true that my ARC run results were indeed using the default settings. However, when I attempted to use my own modified settings and submission files, I encountered errors during the import process. That's why I didn't provide the run results.

I'm very excited that you provided the setup and submission files for local computers. As you mentioned, it's necessary to define cluster software when submitting jobs. Therefore, following the example you provided, I'll need to install cluster management software first. I'm really looking forward to successfully running it after installation. Once again, thank you for your reply; it's the highlight of my day. I'll share and discuss the results with you after running it.

Of course, I'm also looking forward to the development team making it possible for ESS to run on local computers without cluster software. It seems like it would lower the barrier for individual or research groups without access to cluster computing systems.