adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
172 stars 122 forks source link

Need some help with using seisflows in Cluster #193

Closed zhangxiaoshuotttt closed 3 months ago

zhangxiaoshuotttt commented 4 months ago

Hello @bch0w ,

I am encountering issues while using Seisflows on the CentOS cluster, which is managed by Slurm.

Initially, I tried setting up the system by implementing process seisflows par system slurm, but I encountered some problems.

(seisflows) [cugb@comput18 seisflows_workdir]$ seisflows check
which: invalid option -- 'u'
Traceback (most recent call last):
  File "/public/home/.conda/envs/seisflows/bin/seisflows", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/public/home/.conda/envs/seisflows/lib/python3.11/site-packages/seisflows/seisflows.py", line 1395, in main
    sf()
  File "/public/home/.conda/envs/seisflows/lib/python3.11/site-packages/seisflows/seisflows.py", line 447, in __call__
    getattr(self, self._args.command)(**vars(self._args))
  File "/public/home/.conda/envs/seisflows/lib/python3.11/site-packages/seisflows/seisflows.py", line 670, in check
    workflow.check()
  File "/public/home/.conda/envs/seisflows/lib/python3.11/site-packages/seisflows/workflow/forward.py", line 164, in check
    self._modules[req_mod].check()
  File "/public/home/.conda/envs/seisflows/lib/python3.11/site-packages/seisflows/system/slurm.py", line 94, in check
    assert(self.node_size is not None), (
           ^^^^^^^^^^^^^^
  File "/public/home/.conda/envs/seisflows/lib/python3.11/site-packages/seisflows/system/slurm.py", line 121, in node_size
    return self._partitions[self.partition]
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: None

I tried to identify the root cause by going through the codes, and it might be because I didn't set up _partitions. I attempted to set up partition and node_size, but was unsuccessful.

Therefore, I changed the system to cluster, which worked well. However, I am now facing a different issue.

(seisflows) [@admin1 seisflows_workdir]$ seisflows check
2024-02-24 15:20:38 (W) | optional module 'preprocess' has not been instantiated, some functionality of the 'Forward' workflow may be skipped
(seisflows) [@admin1 seisflows_workdir]$ seisflows submit
/bin/sh: /public/home/.conda/envs/seisflows/lib/python3.11/site-packages/seisflows/system/runscripts/submit: No such file or directory

I have provided all the information that might be relevant to help you understand the situation better.

Thank you for your time and assistance. All my settings

seisflows par workflow forward

seisflows par system scluster  

seisflows par solver specfem2d  

seisflows par preprocess null 

seisflows par optimize null  

seisflows configure

seisflows check

seisflows submit

Best regards, Zhang

bch0w commented 4 months ago

Thanks for your question @zhangxiaoshuotttt, in order to run on a specific cluster, you will need to write your own sub-class that inherits from Slurm but provides some additional information like partition. Please see, for example, the Chinook sub-system (https://github.com/adjtomo/seisflows/blob/master/seisflows/system/chinook.py), which represents a sort of minimum working example of what you'll need to write.

You can copy-paste and rename this class for your own purpose, and make adjustments to the internal _partitions variable to match the cores-per-node and partition names on your system (https://github.com/adjtomo/seisflows/blob/master/seisflows/system/chinook.py#L50-L52). Similarly the run_call_header and submit_call_header will need to be modified to match how you would call SLURM directives on your system.

Let me know if that makes sense to you! Happy to help you workshop this and to later include your subclass in the main package.

zhangxiaoshuotttt commented 4 months ago

Thank you for your advice. I agree and will try it now. I'll share my experience if it works and hope it helps others.

zhangxiaoshuotttt commented 3 months ago

I apologize for the delayed response. After testing on our school's cluster, I have found that the method suggested by @bch0w works well. For beginners who wish to use Seisflows on a cluster managed by Slurm, all that is required is to create a sub-class that inherits from Slurm and provides additional information. @bch0w has explained this clearly. Thanks again for this advice.

bch0w commented 3 months ago

Great to hear @zhangxiaoshuotttt! Thanks for following up on this, glad that worked for you. If you are interested in contributing your sub class to the package you are more than welcome to submit a Pull Request to the devel branch. I'll close this as complete for now but if you run into any issues please feel free to reopen or start a new one.