Open jasel-lewis opened 11 months ago
Hi @jasel-lewis, thanks for raising this. I will pull in someone who can answer this.
@poojak13 Wonderful! Any help is greatly appreciated, thank you...
FYSA @shieldsjared
Update to reference a similar re:Post thread.
Update: Converted to AWS support ticket for faster resolution.
Facing the same issue.
Product Version
Issue Description
I was using SageMaker Studio to domain-train a model (base model: huggingface-llm-mistral-7b) using a
ml.g5.24xlarge
instance. I left all values at default other than pointing it to specific buckets for the training data and to output the trained model and adjusted the hyperparameters with:At just over an hour (3,909 seconds) into the training run, I received the error:
I came across this specific post, but don't believe these to be values I can adjust via SageMaker Studio.
Any thoughts on this?
Expected Behavior
Expected the model to be domain-trained successfully.
Observed Behavior
Observed the error identified in the Issue Description section.
Product Category
JumpStart
Feedback Category
Reliability and Stability
Other Details
No response