When running on mac, not all tensors reside on same device (example below)

callummcdougall / ARENA_2.0

Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.

190 stars 78 forks source link

When running on mac, not all tensors reside on same device (example below) #21

Open alitair opened 1 year ago

alitair commented 1 year ago

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, mps:0 and cpu!

examples:

tokens = t.cat([tokens, next_token[None, None]], dim=-1)

in def load_gpt2_test(cls, gpt2_layer, input): comparison = t.isclose(output, reference_output, atol=1e-4, rtol=1e-3) in LayerNorm: return residual* self.w + self.b

callummcdougall commented 1 year ago

Thanks for flagging this - I've not got much experience with macs so not exactly sure what the best fix is here, but I'll try and resolve this asap.

callummcdougall commented 1 year ago

Update from asking others for a recommendation - the best solution here seems to be to explicitly set devices and/or track down where devices are. Seems like it's probably quite hard to do without you investigating this particular case on your own machine I'm afraid, e.g. checking directly which objects are first on mps:0. In this particular case, it seems like tokens were defined by feeding the reference text through gpt and then moving them to the device called device (i.e. the global variable), so I'd recommend checking the value of this first.

alitair commented 1 year ago

For my mac, I set device = t.device("mps") and commented out t.device("cuda" if t.cuda.is_available() else "cpu"). There are also flags such as torch.backends.mps.is_available(). Keep in mind sometimes the tensor ends up at mps without it being set explicitly in the notebook, probably because similar code is probably setting the device in one of the libraries.

callummcdougall commented 1 year ago

Thanks - I've added these recommendations below the setup code in the first of the transformer exercises, and the first exercise in week 0 where the device variable is introduced. Let me know if there are any resources you'd recommend other than the ones I link to there.

danwilhelm commented 10 months ago

I encountered this as well, since the saved tensor devices were 'cuda' which is not available on mac. Perhaps converting the tensors to cpu storage prior to t.save() might help e.g.:

cpu_state_dict = {k: v.cpu() for k, v in state_dict.items()}

callummcdougall commented 10 months ago

Thanks for raising this! Are there particular exercises which you think it would be useful to add a note to (i.e. which exactly were the first exercises which caused an error for you?).

danwilhelm commented 10 months ago

I experimented some more, and the PT files load individually by specifying map_location=t.device('cpu').

So, I recommend changing the Streamlit boilerplate code from:

state_dict = t.load(filename)
state_dict = model.center_writing_weights(t.load(filename))

to:

state_dict = t.load(filename, map_location=device)
state_dict = model.center_writing_weights(state_dict)