bdzyubak / tensorflow-sandbox

A repository for studying applications of Deep Learning across fields, and demonstrating samples of my code and project managment
0 stars 0 forks source link

Speed up removal of many small files #3

Closed bdzyubak closed 2 years ago

bdzyubak commented 2 years ago

The word embeddings tutorial downloads many small text files with only the 'unsupported' directory having 45,000 files which are suggested for removal. Using shutil.rmtree, this removal takes 20 minutes. All 1-D data projects will run into this problem of slow managment. A more efficient way to remove small files is needed.

bdzyubak commented 2 years ago

The unsup folder of 45,000 files is removed quickly by simply using the operating system. Likely, an os.system call can be used, but needs to be OS agnositc. Apparently, shutil.rmtree is somewhat inefficiently coded: https://stackoverflow.com/questions/5470939/why-is-shutil-rmtree-so-slow

bdzyubak commented 2 years ago

Updated os_utils module to use OS specific shell commands for faster deletion. 27261ca0a07ac76ff1ccc14a33d2c35fd930e63d