Pix2Struct RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

giuseppesalvi commented 1 year ago

I tried to launch your notebook for Pix2Struct finetuning on Cord Dataset on Google Colab and got the following error during the trainer.fit(pl_module) execution.

Error:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-24-3786a685d433>](https://localhost:8080/#) in <cell line: 1>()
----> 1 trainer.fit(pl_module)

29 frames
[/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py](https://localhost:8080/#) in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    198     # some Python versions print out the first line of a multi-line function
    199     # calls in the traceback and some print out the last line
--> 200     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

When I tried the same notebook a couple of weeks ago everything worked fine, did something change? Does it have something to do with pytorch 2.0, which is by default the version used in colab now?

Thanks.

khadkechetan commented 1 year ago

Facing the same issue. Any resolution around?

khadkechetan commented 1 year ago

@giuseppesalvi

try Following function. Apologies for code formatting.

class Pix2Struct(pl.LightningModule): def init(self, config, processor, model): super().init() self.config = config self.processor = processor self.model = model

def training_step(self, batch, batch_idx):
    encoding, _ = batch

    outputs = self.model(**encoding)
    loss = outputs.loss
    self.log("train_loss", loss)
    return loss

def validation_step(self, batch, batch_idx, dataset_idx=0):
    encoding, answers = batch
    flattened_patches, attention_mask = encoding["flattened_patches"], encoding["attention_mask"]
    batch_size = flattened_patches.shape[0]
    # we feed the prompt to the model
    decoder_input_ids = torch.full((batch_size, 1), self.model.config.text_config.decoder_start_token_id, device=self.device)

    outputs = self.model.generate(flattened_patches=flattened_patches,
                                  attention_mask=attention_mask,
                                  decoder_input_ids=decoder_input_ids,
                                  max_length=512,
                                  pad_token_id=self.processor.tokenizer.pad_token_id,
                                  eos_token_id=self.processor.tokenizer.eos_token_id,
                                  use_cache=True,
                                  num_beams=1,
                                  bad_words_ids=[[self.processor.tokenizer.unk_token_id]],
                                  return_dict_in_generate=True,)

    predictions = []
    for seq in self.processor.tokenizer.batch_decode(outputs.sequences):
        seq = seq.replace(self.processor.tokenizer.eos_token, "").replace(self.processor.tokenizer.pad_token, "")
        # seq = re.sub(r"<.*?>", "", seq, count=1).strip()  # remove first task start token
        predictions.append(seq)

    scores = []
    for pred, answer in zip(predictions, answers):
        # pred = re.sub(r"(?:(?<=>) | (?=", "", answer, count=1)
        answer = answer.replace(self.processor.tokenizer.eos_token, "")
        scores.append(edit_distance(pred, answer) / max(len(pred), len(answer)))

        if self.config.get("verbose", False) and len(scores) == 1:
            print(f"Prediction: {pred}")
            print(f"    Answer: {answer}")
            print(f" Normed ED: {scores[0]}")

    self.log("val_edit_distance", np.mean(scores))

    return scores

def configure_optimizers(self):
    # you could also add a learning rate scheduler if you want
    optimizer = torch.optim.Adam(self.parameters(), lr=self.config.get("lr"))

    return optimizer

def train_dataloader(self):
    return train_dataloader

def val_dataloader(self):
    return val_dataloader

giuseppesalvi commented 1 year ago

@khadkechetan

I tried your code, and everything worked fine. However, I started to have doubts about the scheduler and optimizer used in the original notebook.

To investigate further, I decided to try the original code but with a different optimizer and it worked without any issues. This suggests that the problem might be related to the Adafactor optimizer specifically.

I changed only this line in the original notebook:

#optimizer = Adafactor(self.parameters(), scale_parameter=False, relative_step=False, lr=self.config.get("lr"), weight_decay=1e-05)

optimizer = torch.optim.Adam(self.parameters(), lr=self.config.get("lr"))

NielsRogge commented 1 year ago

Thanks, actually you might be better of just using AdamW or Adam.

I'll update the notebook when I have the time

NielsRogge / Transformers-Tutorials

Pix2Struct RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #305