CoEDL / elpis

🙊 software for creating speech recognition models.
https://elpis.readthedocs.io/en/latest/
Apache License 2.0
151 stars 33 forks source link

Speed up resetting procedure #276

Closed mattchrlw closed 2 years ago

mattchrlw commented 2 years ago

When the Reset button is clicked, currently the app must wait for the /state directory to be cleared before proceeding. This can be disruptive for large datasets with gigabytes of files to be deleted. This PR makes it so that the files are first moved out of the state folder (quicker) before being actually deleted in the background, which should be faster.

I tried using rsync but it led to some race condition issues (deleting after dataset and interface are re-initialised). The moving approach is safest.

Benchmark on ~848MB of files from the gk dataset:

benfoley commented 2 years ago

Could use shutil.rmtree on each of the four dirs. The “tempted to use..” comment was regarding doing that to the state dir.

On 9 Dec 2021, at 16:00, Matthew Low @.***> wrote:

 @mattchrlw commented on this pull request.

In elpis/engines/common/objects/interface.py:

@@ -67,9 +68,18 @@ class InvalidInterfaceError(Exception):

We need to keep the dir and delete the contents...

for root, subdirectories, files in os.walk(path): There are only 4 directories to delete so the spun-up subprocesses shouldn't be too slow. With regards to

Why is the os.walk necessary? Can't you move and delete entire directories with shutil?

see the above comment:

Tempted to use shutil.rmtree? It breaks if we have mounted /state from local filesystem into the docker container. Error is "Device or resource busy: '/state'" We need to keep the dir and delete the contents...

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub, or unsubscribe.

mattchrlw commented 2 years ago

Could use shutil.rmtree on each of the four dirs.

isn’t this the existing approach that we are replacing? I think Nick was referring to using shutil.rmtree on the whole directory, although I may be wrong.