Open alitair opened 1 year ago
Thanks for flagging this - I've not got much experience with macs so not exactly sure what the best fix is here, but I'll try and resolve this asap.
Update from asking others for a recommendation - the best solution here seems to be to explicitly set devices and/or track down where devices are. Seems like it's probably quite hard to do without you investigating this particular case on your own machine I'm afraid, e.g. checking directly which objects are first on mps:0. In this particular case, it seems like tokens
were defined by feeding the reference text through gpt and then moving them to the device called device
(i.e. the global variable), so I'd recommend checking the value of this first.
For my mac, I set device = t.device("mps") and commented out t.device("cuda" if t.cuda.is_available() else "cpu"). There are also flags such as torch.backends.mps.is_available(). Keep in mind sometimes the tensor ends up at mps without it being set explicitly in the notebook, probably because similar code is probably setting the device in one of the libraries.
Thanks - I've added these recommendations below the setup code in the first of the transformer exercises, and the first exercise in week 0 where the device
variable is introduced. Let me know if there are any resources you'd recommend other than the ones I link to there.
I encountered this as well, since the saved tensor devices were 'cuda' which is not available on mac. Perhaps converting the tensors to cpu storage prior to t.save()
might help e.g.:
cpu_state_dict = {k: v.cpu() for k, v in state_dict.items()}
Thanks for raising this! Are there particular exercises which you think it would be useful to add a note to (i.e. which exactly were the first exercises which caused an error for you?).
I experimented some more, and the PT files load individually by specifying map_location=t.device('cpu')
.
So, I recommend changing the Streamlit boilerplate code from:
state_dict = t.load(filename)
state_dict = model.center_writing_weights(t.load(filename))
to:
state_dict = t.load(filename, map_location=device)
state_dict = model.center_writing_weights(state_dict)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, mps:0 and cpu!
examples:
tokens = t.cat([tokens, next_token[None, None]], dim=-1)
in def load_gpt2_test(cls, gpt2_layer, input): comparison = t.isclose(output, reference_output, atol=1e-4, rtol=1e-3) in LayerNorm: return residual* self.w + self.b