Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
8.17k stars 827 forks source link

Meaningful error if no validation split fraction is provided in custom JSON data module #1245

Open rasbt opened 3 months ago

rasbt commented 3 months ago

If a user doesn't set --data.val_split_fraction in

litgpt finetune lora \
  --data JSON \
  --data.json_path ....json \
  --checkpoint_dir checkpoints/$REPO_NAME

It raises an error

    train_data, test_data = self.get_splits()
  File "/teamspace/studios/this_studio/litgpt/litgpt/data/json_data.py", line 112, in get_splits
    [1.0 - self.val_split_fraction, self.val_split_fraction],
TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

Should we use a meaningful default instead? E.g., 5% or 10% of the training data?

carmocca commented 3 months ago

@awaelchli Improved the error in #1241. Still, we could set a default fraction