adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
172 stars 122 forks source link

Example 2 fails to run #205

Closed scott314159 closed 1 month ago

scott314159 commented 3 months ago

I am just starting out with seisflows so I decided to try to run the examples. examples 1 and 3 run to completion but example 2 does not. I put the error message below. It fails at the end of iteration 1

Any advice would be appreciated.

FINALIZING ITERATION

2024-03-28 11:10:49 [INFO] | writing model m_new to /home/scott/Seismic/FWI/Work/example2/output/MODEL_01 2024-03-28 11:10:50 [DEBU] | organizing log files in /home/scott/Seismic/FWI/Work/example2/logs 2024-03-28 11:10:53 [INFO] | running finalization for module 'system.Workstation' 2024-03-28 11:10:54 [INFO] | running finalization for module 'preprocess.Pyaflowa' Traceback (most recent call last): File "/homes/scott/miniconda3/envs/seisflows_3.11.8/bin/seisflows", line 8, in sys.exit(main()) ^^^^^^ File "/home/scott/seisflows/seisflows/seisflows.py", line 1428, in main sf() File "/home/scott/seisflows/seisflows/seisflows.py", line 450, in call getattr(self, self._args.command)(**vars(self._args)) File "/home/scott/seisflows/seisflows/seisflows.py", line 802, in submit system.submit(workdir=self._args.workdir, File "/home/scott/seisflows/seisflows/system/workstation.py", line 211, in submit workflow.run() File "/home/scott/seisflows/seisflows/workflow/inversion.py", line 198, in run super().run() # Runs task list ^^^^^^^^^^^^^ File "/home/scott/seisflows/seisflows/workflow/forward.py", line 280, in run func() File "/home/scott/seisflows/seisflows/workflow/inversion.py", line 677, in finalize_iteration module.finalize() File "/home/scott/seisflows/seisflows/preprocess/pyaflowa.py", line 684, in finalize unix.rm(self.path._logs) File "/home/scott/seisflows/seisflows/tools/unix.py", line 205, in rm shutil.rmtree(name) File "/homes/scott/miniconda3/envs/seisflows_3.11.8/lib/python3.11/shutil.py", line 763, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/homes/scott/miniconda3/envs/seisflows_3.11.8/lib/python3.11/shutil.py", line 761, in rmtree os.rmdir(path, dir_fd=dir_fd) OSError: [Errno 39] Directory not empty: '/home/scott/Seismic/FWI/Work/example2/scratch/preprocess/logs'

bch0w commented 3 months ago

Hi @scott314159, thanks for raising the issue, sorry you had troubles with the examples. The devel branch is in rapid active development and not super stable right now, tests are definitely not passing so that is expected.

Examples for the devel branch were fixed in #195 and I just ran the Example 2 on the latest devel branch succesfully. Can you confirm you are on the latest version (should be 3.0.2) and try the example agian?

scott314159 commented 3 months ago

@bch0w Thanks for the speedy reply. I missed the update (I was running 3.0.0) so I installed 3.0.2 and reran example 2. Unfortunately I still have an issue. When I reran the example I did get an error where the tasktime was too short (this happened in both versions) so I updated the parameter file to change tasktime from 1 ==> 5 and reran (seisflows submit).

I am running CentOS 7.9, python 3.11.8

2024-03-28 14:19:18 [INFO] | setting trial model as starting model (m_try -> m_new) 2024-03-28 14:19:18 [INFO] | misfit of accepted trial model is f=1.246E-01 2024-03-28 14:19:18 [INFO] | resetting line search step count to 0 2024-03-28 14:19:18 [DEBU] | checkpointing workflow to seisflows state file 2024-03-28 14:19:18 [INFO] | //////////////////////////////////////////////////////////////////////////////// FINALIZE ITERATION
//////////////////////////////////////////////////////////////////////////////// 2024-03-28 14:19:18 [INFO] | running finalization for module 'system.Workstation' 2024-03-28 14:19:19 [INFO] | running finalization for module 'solver.Specfem2D' 2024-03-28 14:19:19 [INFO] | running finalization for module 'preprocess.Pyaflowa' 2024-03-28 14:19:27 [DEBU] | checkpointing workflow to seisflows state file Traceback (most recent call last): File "/homes/scott/miniconda3/envs/seisflows/bin/seisflows", line 8, in sys.exit(main()) ^^^^^^ File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/seisflows.py", line 1430, in main sf() File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/seisflows.py", line 451, in call getattr(self, self._args.command)(**vars(self._args)) File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/seisflows.py", line 803, in submit system.submit(workdir=self._args.workdir, File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/system/workstation.py", line 211, in submit workflow.run() File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/inversion.py", line 197, in run super().run() # Runs task list ^^^^^^^^^^^^^ File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/forward.py", line 284, in run func() File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/inversion.py", line 625, in finalize_iteration super().finalize_iteration() File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/forward.py", line 636, in finalize_iteration module.finalize() File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/preprocess/pyaflowa.py", line 691, in finalize unix.rm(self.path._logs) File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/tools/unix.py", line 205, in rm shutil.rmtree(name) File "/homes/scott/miniconda3/envs/seisflows/lib/python3.11/shutil.py", line 763, in rmtree onerror(os.rmdir, path, sys.exc_info()) File "/homes/scott/miniconda3/envs/seisflows/lib/python3.11/shutil.py", line 761, in rmtree os.rmdir(path, dir_fd=dir_fd) OSError: [Errno 39] Directory not empty: '/home/scott/Seismic/FWI/Work/example2/scratch/preprocess/logs'

bch0w commented 3 months ago

Hey @scott314159, thanks for updating. Super interesting, looks like this might be coming from unexpected behavior of the built-in Python shutil.rmtree function which SeisFlows is wrapping with its unix utilities.

On my machine unix.rm is able to remove non-empty directories but it seems like that is not the case for your situation. I'm using Python 3.11.3 to test but also tried on 3.11.8.

Can you run the following commands in Python and see what happens? It'll just make a non-empty directory and then delete it.

from seisflows.tools import unix
unix.mkdir("test/test")
unix.rm("test")

Could you also let me know if the parameter export_log_files is set in your parameter file, and if it's True or False?

scott314159 commented 3 months ago

@bch0w

1) I tried the rm test and it seems to work fine. I confirmed that the directory is created and removed as I ran the python line by line. Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

from seisflows.tools import unix unix.mkdir("test/test") unix.rm("test") unix.mkdir("test/test/test") unix.rm("test")

2) export_log_files is set to True

bch0w commented 3 months ago

Hey @scott314159, thanks for that. I did a bit more google searching and it seems like this is a Python+Linux issue, and a known bug with theshutil.rmtree command being used on .nfs filesystems, see e.g.,:

https://stackoverflow.com/questions/11228079/python-remove-directory-error-file-exists

My guess is there are some .nfs* files still in that preprocess/logs directory that prevent it from being deleted, leading to this error. Not sure if you have been running in clean working directories, but that might help. You could also try running example 1, just to see if the different preprocessing module in Example 2 is causing issues.

scott314159 commented 3 months ago

@bch0w thanks for sticking with me on this issue.

1) example 1 runs to completion. Also, example 3 runs to completion. I ran both are on the same file system where example 2 was having trouble. 2) Thanks for the nfs idea. I did not think of that one. I figured that If nfs file lock was causing the issue then if I run on a local disk I should be OK. So, I ran example 2 on /scratch and it did complete. So, I guess somewhere in the prreprocessing there is a file that isn't being closed?

You have been a big help. I'll be glad to run additional tests if it will help you.

Below are a few more details that may or may not be important: How I ran example 2: I ran example 2 a number of times and it always seems to fail at the same place. During the reruns I did a seisflows clean and an rm -rf. The run seems to fail at the same place each time which, I guess, supports the open file hypothesis.

I did look in output/preprocess/logs directory on the failed run and it is empty. I guess it is possible that the nfs lock was released after the workflow failed.

bch0w commented 3 months ago

No problem, @scott314159, happy to help! Glad that running on local helped, perhaps there is a bit of a race condition happening here where the file lock cannot be removed before the delete command is run.

Makes sense that it always happens on that finalization command as I think that is the only place we try to delete those log files. I think one way to get around this would be to add a try-except on the deletion with a few seconds in between each try to see if that gives the system enough time to release the lock.

If it's alright with you, I'll open a new issue referencing this one, and have you close this one? Please don't hesitate to raise new issues if you encounter more problems, though!

bch0w commented 1 month ago

Closing this issue and continuing discussion in #206