aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
9.79k stars 6.66k forks source link

[Bug Report] RuntimeError when running instruction fine-tuning on mistral 7b, Sagemaker Jumpstart #4649

Open louishourcade opened 1 month ago

louishourcade commented 1 month ago

Link to the notebook https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/mistral-7b-instruction-domain-adaptation-finetuning.ipynb

Describe the bug I get an error when I run the training step for instruction fine-tuning in this notebook. The training job starts properly, but after ~10min it fails and raises: ErrorMessage "raise RuntimeError( RuntimeError: Could not find response key [1, 32002] in token IDs tensor([ 1, 20811, 349, ..., 302, 15637, 266])

To reproduce

Logs Attaching some screenshots of the logs

Screenshot 2024-05-03 at 16 45 43

Screenshot 2024-05-03 at 16 47 10

Any idea on how to fix this ?

prakash5801 commented 1 month ago

@louishourcade: Facing same issue while running the example notebook from AWS. Did you find the solution?

louishourcade commented 2 weeks ago

Hi @prakash5801, no I didn't find time to investigate more. But I saw yesterday that the error is still there