Open santoshmedisetty opened 1 year ago
Hi,
I'm training a YOLOv5 model on sagemaker. I've created an Experiment and Trial for training the model. But the training metrics like precision, recall, mAP, etc are not being recorded in the Sagemaker.
I've followed the process similar to https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-experiments/mnist-handwritten-digits-classification-experiment/mnist-handwritten-digits-classification-experiment.ipynb
Is it a problem with the IAM role or something like that?
I'm triggering the training process using 'Estimator' as shown below.
yolov5_experiment = Experiment.create( experiment_name=f"yolov5-training-job-{timenow}", description="yolov5n model training", sagemaker_boto_client=sm, )
yolov5_training_job_name = f'yolov5-training-job-{timenow}'
trial_name = f"yolov5-training-job-{timenow}" yolov5_trial = Trial.create( trial_name=trial_name, experiment_name=yolov5_experiment.experiment_name, sagemaker_boto_client=sm, )
estimator = Estimator( image_uri=container, role=role, instance_count=1, instance_type='ml.m4.xlarge',
input_mode='File', output_path=outpath, base_job_name='yolov5', sagemaker_session=sagemaker.Session(sagemaker_client=sm), metric_definitions=[ {'Name': 'metrics/mAP_0.5', "Regex": "metrics/mAP_0.5: (.*?);"}, {'Name': 'metrics/mAP_0.5:0.95', "Regex": "metrics/mAP_0.5:0.95: (.*?);"}, {'Name': 'metrics/recall', "Regex": "metrics/recall: (.*?);"}, {'Name': 'metrics/precision', "Regex": "metrics/precision: (.*?);"}, {'Name': 'train/box_loss', "Regex": "train/box_loss: (.*?);"}, {'Name': 'train/cls_loss', "Regex": "train/cls_loss: (.*?);"}, {'Name': 'train/obj_loss', "Regex": "train/obj_loss: (.*?);"}, {'Name': 'val/cls_loss', "Regex": "val/cls_loss: (.*?);"}, {'Name': 'val/obj_loss', "Regex": "val/obj_loss: (.*?);"}, {'Name': 'val/box_loss',"Regex": "val/box_loss: (.*?);"}, {'Name': 'Epoch', "Regex": "Epoch: (.*?);"} ], enable_sagemaker_metrics=True,
)
estimator.fit(inputs,job_name=yolov5_training_job_name, experiment_config={ "ExperimentName": yolov5_experiment.experiment_name, "TrialName": yolov5_trial.trial_name, "TrialComponentDisplayName": "Training", }, wait=True,)
Hi,
I'm training a YOLOv5 model on sagemaker. I've created an Experiment and Trial for training the model. But the training metrics like precision, recall, mAP, etc are not being recorded in the Sagemaker.
I've followed the process similar to https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-experiments/mnist-handwritten-digits-classification-experiment/mnist-handwritten-digits-classification-experiment.ipynb
Is it a problem with the IAM role or something like that?
I'm triggering the training process using 'Estimator' as shown below.
yolov5_experiment = Experiment.create( experiment_name=f"yolov5-training-job-{timenow}", description="yolov5n model training", sagemaker_boto_client=sm, )
yolov5_training_job_name = f'yolov5-training-job-{timenow}'
trial_name = f"yolov5-training-job-{timenow}" yolov5_trial = Trial.create( trial_name=trial_name, experiment_name=yolov5_experiment.experiment_name, sagemaker_boto_client=sm, )
estimator = Estimator( image_uri=container, role=role, instance_count=1, instance_type='ml.m4.xlarge',
instance_type='local',
)
estimator.fit(inputs,job_name=yolov5_training_job_name, experiment_config={ "ExperimentName": yolov5_experiment.experiment_name, "TrialName": yolov5_trial.trial_name, "TrialComponentDisplayName": "Training", }, wait=True,)