adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
172 stars 122 forks source link

potential race condition prevents 'unix.rm' from deleting directory #206

Open bch0w opened 3 months ago

bch0w commented 3 months ago

Raised in #205 by @scott314159, there seems to be a race condition during the finalization command of the Pyaflowa preprocessing module where the main job tries to delete the log file directory but returns OSError: [Errno 39] Directory not empty , which traces back from shutil.rmtree.

This seems to be a know issue with shutil.rmtree, see e.g., https://stackoverflow.com/questions/11228079/python-remove-directory-error-file-exists that happens on NFS file systems where file locks are present.

My thinking is that this is a race condition where the file locks cannot be removed before the rm command tries to delete. One potential try is to put a try-except clause in unix.rm that waits a few seconds if it hits this OSError, to give the filesystem some time to remove it's lock.

https://github.com/adjtomo/seisflows/blob/d7661ee32c8deef7574a89d31290a96453324777/seisflows/tools/unix.py#L197-L205

scott314159 commented 3 months ago

Thanks. Once I get to picking apart the scripts I will consider this.

As an aside, I am not sure why log files would be removed as part of the process? If the information in the log file is potentially useful it should not be deleted so the user can review the logs if needed. If the log files are not useful then why write them? I know that you are just calling Pyaflowa as part of the seisflows scripts so you don’t necessarily have control over the behavior.

From: Bryant Chow @.> Sent: Sunday, March 31, 2024 12:13 PM To: adjtomo/seisflows @.> Cc: scott314159 @.>; Mention @.> Subject: [adjtomo/seisflows] potential race condition prevents 'unix.rm' from deleting directory (Issue #206)

Raised in #205 https://github.com/adjtomo/seisflows/issues/205 by @scott314159 https://github.com/scott314159 , there seems to be a race condition during the finalization command of the Pyaflowa preprocessing module where the main job tries to delete the log file directory but returns OSError: [Errno 39] Directory not empty , which traces back from shutil.rmtree.

This seems to be a know issue with shutil.rmtree, see e.g., https://stackoverflow.com/questions/11228079/python-remove-directory-error-file-exists that happens on NFS file systems where file locks are present.

My thinking is that this is a race condition where the file locks cannot be removed before the rm command tries to delete. One potential try is to put a try-except clause in unix.rm that waits a few seconds if it hits this OSError, to give the filesystem some time to remove it's lock.

— Reply to this email directly, view it on GitHub https://github.com/adjtomo/seisflows/issues/206 , or unsubscribe https://github.com/notifications/unsubscribe-auth/BHNHRJTH4WA2L73OI5DDNH3Y3BG2NAVCNFSM6AAAAABFQTI6L6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYTOMJRGY4TEMQ . You are receiving this because you were mentioned. https://github.com/notifications/beacon/BHNHRJTU2MOWFRRIBCHDUTDY3BG2NA5CNFSM6AAAAABFQTI6L6WGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHIIJUE7I.gif Message ID: @. @.> >

bch0w commented 1 month ago

Hey @scott314159, again, sorry for the long pause in communication here! Yeah the log message removal is intended to reduce file count, as for large-scale problems there may be (hundreds of) thousands of files created which may stress file systems, but I agree there is usually useful information there. The parameter export_log_files is meant to make it a User decision to save these files by copying them to a more permanent location, and is set True by default.