Closed jharrang closed 4 years ago
Hello @jharrang,
Sorry for the late response.
Let me reach out to the corresponding team who handles the training platform for SageMaker and get back to you.
Reference: 0413038650
Thank you for your patience.
Hello @jharrang SageMaker Training jobs are billed from training start time to completion when the model is uploaded to S3.
So the correct answer to your question is (B) - i.e. the billing clock stops after the last instance exits and SageMaker has had a chance to clean up any outstanding work like uploading models, logs etc.
closing due to inactivity. feel free to reopen if necessary.
Reference: 0413038650
System Information
Background:
My group is developing a BYOC Algorithm Resource. The final steps of our
TrainingJob
workflow only require one instance to run, but there are earlier steps that we'll be running distributed.Question:
Is it possible to terminate some instances in a SageMaker
TrainingJob
cluster while other instances continue running? i.e if we run aTrainingJob
with 10 instances, but the entrypoint scripts of 9 of those instances callsys.exit(0)
while the single remaining instance continues to do work, will SageMaker: