d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
22.45k stars 4.19k forks source link

nn.dataparallel has issue for mac (mps device) #2601

Open rnb007 opened 1 month ago

rnb007 commented 1 month ago

This is the error i get when I use the below function

def try_all_gpus(): #@save """Return all available GPUs, or [cpu(),] if no GPU exists.""" devices = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())] return devices if devices else [torch.device('cpu')]

trainer = torch.optim.Adam(net.parameters(), lr=lr) 3 loss = nn.CrossEntropyLoss(reduction="none") ----> 4 d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs)

File ~/anaconda3/envs/dl_env/lib/python3.10/site-packages/d2l/torch.py:1507, in train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices) 1504 timer, num_batches = d2l.Timer(), len(train_iter) 1505 animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1], 1506 legend=['train loss', 'train acc', 'test acc']) -> 1507 net = nn.DataParallel(net, device_ids=devices).to(devices[0]) 1508 for epoch in range(num_epochs): 1509 # Sum of training loss, sum of training accuracy, no. of examples, 1510 # no. of predictions 1511 metric = d2l.Accumulator(4)

IndexError: list index out of range

This happens as mac does not have cuda support.

by tweaking the above function by just changing cpu to mps, kernel always dies def try_all_gpus(): #@save """Return all available GPUs, or [cpu(),] if no GPU exists.""" devices = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())] return devices if devices else [torch.device('mps')]

How can i run the nn.dataparallel or for that matter chapter 16 d2l.train_ch13 function from 16.2 section ?