Closed laclouis5 closed 1 year ago
Yeah, file_descriptor
is not valid on darwin (which your macOS is apparently based on) (https://github.com/pytorch/pytorch/blob/f89ae0a7f48ea8f941c6c9655a934eb2fcc5eccc/torch/multiprocessing/__init__.py#L42)
Using file_system
is most likely fine (https://pytorch.org/docs/stable/multiprocessing.html#file-system-file-system). This will not change the results, it affects only the failure-robustness of the multiprocess dataloading.
Yes, thus it could be great to automatically use "file_system"
on macOS and Windows instead of hardcoding this configuration value to "file_descriptor"
, which is the current behavior:
This is the strategy chosen by PyTorch (here) and could improve the compatibility of this repo.
Yeah I'll put in on my list. Tbh, I am surprised that this is your only problem running the repo on macOS though? The entire thing is tested only on linux.
Let me know how far you get, or what else comes up!
Fixed with release https://github.com/JonasGeiping/cramming/releases/tag/Torch2.1
The verification command fails on macOS Ventura on a MacBook Pro M1 Pro:
python pretrain.py name=test arch=bert-base train=bert-base data=sanity-check-2 dryrun=True impl.microbatch_size=2
The error:
Upon investigation, it looks like
impl.sharing_strategy
is"file_descriptor"
(default value) but_all_sharing_strategies
only includes"file_system"
on macOS and Windows. Changing this value tofile_system
solves the issue, thought I do not know the implications:python pretrain.py name=test arch=bert-base train=bert-base data=sanity-check-2 dryrun=True impl.microbatch_size=2 impl.sharing_strategy=file_system