Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
https://lightning.ai
Apache License 2.0
6.85k stars 726 forks source link

ValueError: Cannot attend to 3063, block size is only 2048 #1387

Closed Gooooooogo closed 1 week ago

Gooooooogo commented 1 week ago
{'checkpoint_dir': PosixPath('checkpoints[/TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/TinyLlama/TinyLlama-1.1B-Chat-v1.0)'),
 'data': JSON(json_path=PosixPath('[/home/jwan3704/litgpt/data/math/algebra.json](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/home/jwan3704/litgpt/data/math/algebra.json)'), mask_prompt=False, val_split_fraction=0.0, prompt_style=<litgpt.prompts.Alpaca object at 0x7efbdd21d550>, ignore_index=-100, seed=42, num_workers=4),
 'devices': 1,
 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100, initial_validation=False),
 'logger_name': 'csv',
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_head': False,
 'lora_key': False,
 'lora_mlp': False,
 'lora_projection': False,
 'lora_query': True,
 'lora_r': 8,
 'lora_value': True,
 'out_dir': PosixPath('out[/model_1](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/model_1)'),
 'precision': None,
 'quantize': None,
 'seed': 1337,
 'train': TrainArgs(save_interval=10000, log_interval=1, global_batch_size=16, micro_batch_size=1, lr_warmup_steps=100, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=None, tie_embeddings=None, learning_rate=0.0003, weight_decay=0.02, beta1=0.9, beta2=0.95, max_norm=None, min_lr=6e-05)}
Using bfloat16 Automatic Mixed Precision (AMP)
[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/torch/utils/data/dataset.py:449](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/torch/utils/data/dataset.py:449): UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
  warnings.warn(f"Length of split at index {i} is 0. "
Seed set to 1337
Number of trainable parameters: 1,126,400
Number of non-trainable parameters: 1,100,048,384
Traceback (most recent call last):
  File "[/home/jwan3704/litgpt-venv/bin/litgpt](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/home/jwan3704/litgpt-venv/bin/litgpt)", line 8, in <module>
    sys.exit(main())
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/__main__.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/__main__.py)", line 143, in main
    fn(**kwargs)
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py)", line 144, in setup
    fabric.launch(main, devices, seed, config, data, checkpoint_dir, out_dir, train, eval)
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py)", line 845, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py)", line 931, in _wrap_and_launch
    return to_run(*args, **kwargs)
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py)", line 936, in _wrap_with_setup
    return to_run(*args, **kwargs)
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py)", line 197, in main
    fit(
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py)", line 249, in fit
    model.max_seq_length = min(longest_seq_length, train.max_seq_length or float("inf"))
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/wrappers.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/wrappers.py)", line 272, in __setattr__
    setattr(original_module, name, value)
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/torch/nn/modules/module.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/torch/nn/modules/module.py)", line 1747, in __setattr__
    super().__setattr__(name, value)
  File "[/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/model.py](https://vscode-remote+ssh-002dremote-002b172-002e17-002e34-002e153.vscode-resource.vscode-cdn.net/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/model.py)", line 47, in max_seq_length
    raise ValueError(f"Cannot attend to {value}, block size is only {self.config.block_size}")
ValueError: Cannot attend to 3063, block size is only 2048
rasbt commented 1 week ago

We should probably change the defaults, but for the time being, can you try passing --train.max_seq_length 2048?