d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
https://D2L.ai
Other
22.45k stars 4.19k forks source link

nn.dataparallel has issue for mac (mps device) #2601

Open rnb007 opened 1 month ago

rnb007 commented 1 month ago

This is the error i get when I use the below function

def try_all_gpus(): #@save """Return all available GPUs, or [cpu(),] if no GPU exists.""" devices = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())] return devices if devices else [torch.device('cpu')]

trainer = torch.optim.Adam(net.parameters(), lr=lr) 3 loss = nn.CrossEntropyLoss(reduction="none") ----> 4 d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs)

File ~/anaconda3/envs/dl_env/lib/python3.10/site-packages/d2l/torch.py:1507, in train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices) 1504 timer, num_batches = d2l.Timer(), len(train_iter) 1505 animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1], 1506 legend=['train loss', 'train acc', 'test acc']) -> 1507 net = nn.DataParallel(net, device_ids=devices).to(devices[0]) 1508 for epoch in range(num_epochs): 1509 # Sum of training loss, sum of training accuracy, no. of examples, 1510 # no. of predictions 1511 metric = d2l.Accumulator(4)

IndexError: list index out of range

This happens as mac does not have cuda support.

by tweaking the above function by just changing cpu to mps, kernel always dies def try_all_gpus(): #@save """Return all available GPUs, or [cpu(),] if no GPU exists.""" devices = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())] return devices if devices else [torch.device('mps')]

How can i run the nn.dataparallel or for that matter chapter 16 d2l.train_ch13 function from 16.2 section ?