asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers
https://asteroid-team.github.io/
MIT License
2.22k stars 422 forks source link

Error in asteroid.engine.System() for M1 #573

Closed EliasLum closed 2 years ago

EliasLum commented 2 years ago

Asteroid installation for M1

I am currently trying to use Asteroid on an Apple M1 MacBook Air (Big Sur 11.5.2). On ARM python the Asteroid training runs into an error when using: asteroid.engine.System()

~/miniforge3/envs/[myenv]/lib/python3.8/site-packages/torch/distributed/_sharding_spec/api.py in DevicePlacementSpec() 27 """ 28 ---> 29 device: torch.distributed._remote_device 30 31 def __post_init__(self): AttributeError: module 'torch.distributed' has no attribute '_remote_device'

The torch.distributed._remote_device attribute seems not to be found.

To Reproduce

I installed tensorflow as suggested by Apple through miniforge: https://github.com/apple/tensorflow_macos/issues/153

Then i installed torchaudio as suggested in this post: https://github.com/pytorch/audio/issues/1573#issuecomment-962412479

Then pytorch and torchvision as suggested by the documentation: https://pytorch.org/get-started/locally/

Finally i installed asteroid using pip.

Then when i run the getting started script locally i get the above error when running: from asteroid.engine import System

system = System(model, optimizer, loss, trainLoader, valLoader)

Expected behavior

No Error message during the System call

Environment

Package versions

asteroid-versions:

0.5.1
0.4.5
0.4.0

output of:

pip freeze | egrep -i 'pytorch|torch|asteroid'
pip freeze | egrep -i 'pytorch|torch|asteroid'
asteroid==0.5.1
asteroid-filterbanks==0.4.0
pytorch-lightning==1.5.1
pytorch-ranger==0.1.1
torch @ file:///Users/runner/miniforge3/conda-bld/pytorch-recipe_1635217280611/work
torch-optimizer==0.1.0
torch-stoi==0.1.2
torchaudio==0.10.0
torchmetrics==0.6.0
torchvision==0.10.0

python version:

/Users/[USERNAME]/miniforge3/envs/era-rnd/bin/python: Mach-O 64-bit executable arm64
mpariente commented 2 years ago

Is it the System call, or the .fit call?

Did you try pytorch-lightning without asteroid?

I don't have an M1 but more info can only help.

EliasLum commented 2 years ago

It is the system not the fit call. And i can import pytorch-lightning, i did not try out to run a training with lightning without asteroid however. It seems that this folder does not exist in my directory (while it is expecting it) /site-packages/torch/distributed/_remote_device

However i moved away from using my M1 machine for this issue, so i'll also not spend more time on figuring this out and rather just use another machine for now.