LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
218 stars 85 forks source link

make_bag is not thread safe #112

Open jcushman opened 6 years ago

jcushman commented 6 years ago

Creating multiple bags in threads doesn't work:

import bagit
from multiprocessing.pool import ThreadPool
ThreadPool().map(bagit.make_bag, ('dirA', 'dirB'))

This fails with a FileNotFoundError because make_bag uses os.chdir, which is not thread-safe, so the two threads change directories on each other while bagging.

I see there's already a note in the code to stop using chdir: # FIXME: if we calculate full paths we won't need to deal with changing directories. I just wanted to add in particular that the current code prevents multithreading.

(Using a process pool instead of a thread pool would work around this issue, but doesn't help in my particular case because my worker threads need to share memory.)

acdha commented 6 years ago

Indeed – I was also looking at a more comprehensive fix so we could also start supporting non-POSIX interfaces such as S3 in https://github.com/acdha/bagit-python/tree/flexible-fileio but I haven't worked on that in awhile.