Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.31k stars 3.38k forks source link

Wrong PositionalEncoding in the Transformer example #19138

Open Galaxy-Husky opened 10 months ago

Galaxy-Husky commented 10 months ago

Bug description

Hi,

I think there are several mistakes in the implemention of the PositionalEncoding in the lightning/pytorch/demos/transformer.py.

Since the transformer is set to https://github.com/Lightning-AI/pytorch-lightning/blob/275822d5aff848b7f0a4298498e905457b8455c9/src/lightning/pytorch/demos/transformer.py#L47

  1. The shape of the pe should be [batch_size, len, dim]. This transpose operation is redundant. https://github.com/Lightning-AI/pytorch-lightning/blob/275822d5aff848b7f0a4298498e905457b8455c9/src/lightning/pytorch/demos/transformer.py#L97
  2. And the shape of self.pe should be self.pe[:, :x.size(1)] https://github.com/Lightning-AI/pytorch-lightning/blob/275822d5aff848b7f0a4298498e905457b8455c9/src/lightning/pytorch/demos/transformer.py#L88
  3. The result of the addition above is not assigned to x, so the pe will not take effect.

Would you please let me know if I'm wrong?

What version are you seeing the problem on?

master

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment ``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```

More info

No response

cc @borda

NicholasKiefer commented 10 months ago

You are not on the master branch.

Galaxy-Husky commented 10 months ago

You are not on the master branch. Thank you for your reply. I checked the code permalinks and found that they will be changed automatically to branch 275822 from master after I copy and paste in the box. In fact, they are exactly the same.

awaelchli commented 10 months ago

@Galaxy-Husky Do you want to make the adjustments in a PR?

awaelchli commented 10 months ago

I think it would be good to go over the example one more time to fix the correctness issues. The initial version was just ported from the PyTorch examples repo but the goal then was not to make it train well but to serve as dummy example for quick testing.

Galaxy-Husky commented 10 months ago

@Galaxy-Husky Do you want to make the adjustments in a PR?

Sure, I'd love to.

Galaxy-Husky commented 10 months ago

I think it would be good to go over the example one more time to fix the correctness issues. The initial version was just ported from the PyTorch examples repo but the goal then was not to make it train well but to serve as dummy example for quick testing.

Thank you for your explanation. I got it !

manu-chauhan commented 2 months ago

Help needed for this ? I see this is still open. @awaelchli