Error showing failed to acquire shared file lock during downloading model weight file from hub when cache directory is on a NFS mount.
Traceback (most recent call last):
File "/root/petals/src/petals/server/from_pretrained.py", line 139, in _load_state_dict_from_file
with allow_cache_reads(cache_dir):
File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
return next(self.gen)
File "/root/petals/src/petals/utils/disk_cache.py", line 26, in _blocks_lock
fcntl.flock(lock_fd.fileno(), mode)
OSError: [Errno 9] Bad file descriptor`
Reproduce:
Make sure open the file on a NFS mounted directory:
import os
import fclnt
f = open("/nfs-mounted-dir/abc","wb")
fcntl.flock(f.fileno(), fcntl.LOCK_SH)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 9] Bad file descriptor
Suggested resolution:
Open file in "wb+" mode
import os
import fclnt
f = open("/nfs-mounted-dir/abc","wb+")
fcntl.flock(f.fileno(), fcntl.LOCK_SH)
Error showing failed to acquire shared file lock during downloading model weight file from hub when cache directory is on a NFS mount.
Reproduce: Make sure open the file on a NFS mounted directory:
Suggested resolution: Open file in "wb+" mode