TrainingImageConfig with TrainingRepositoryAccessMode set to VPC must be provided when using a training image from a private Docker registry

G-Slient commented 3 years ago

When i am trying to start the training process using estimator.fit(inputs) I am getting this error.

ClientError                               Traceback (most recent call last)
<ipython-input-56-0731e612c7b1> in <module>
----> 1 estimator.fit(inputs)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
    656         self._prepare_for_training(job_name=job_name)
    657 
--> 658         self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
    659         self.jobs.append(self.latest_training_job)
    660         if wait:

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs, experiment_config)
   1419         """
   1420         train_args = cls._get_train_args(estimator, inputs, experiment_config)
-> 1421         estimator.sagemaker_session.train(**train_args)
   1422 
   1423         return cls(estimator.sagemaker_session, estimator._current_job_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image_uri, algorithm_arn, encrypt_inter_container_traffic, use_spot_instances, checkpoint_s3_uri, checkpoint_local_path, experiment_config, debugger_rule_configs, debugger_hook_config, tensorboard_output_config, enable_sagemaker_metrics, profiler_rule_configs, profiler_config)
    560         LOGGER.info("Creating training-job with name: %s", job_name)
    561         LOGGER.debug("train request: %s", json.dumps(train_request, indent=4))
--> 562         self.sagemaker_client.create_training_job(**train_request)
    563 
    564     def _get_train_request(  # noqa: C901

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    674             error_code = parsed_response.get("Error", {}).get("Code")
    675             error_class = self.exceptions.from_code(error_code)
--> 676             raise error_class(parsed_response, operation_name)
    677         else:
    678             return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: TrainingImageConfig with TrainingRepositoryAccessMode set to VPC must be provided when using a training image from a private Docker registry. Please provideTrainingImageConfig and TrainingRepositoryAccessMode set to VPC when using a training image from a private Docker registry.

nfbalbontin commented 3 years ago

Hi! I was able to solve it by replacing image_name to image_uri in the creation of the docker container: image_uri = 'tf2-object-detection'
!sh ./docker/build_and_push.sh $image_uri
And in the create_model function:
def create_model( self, model_server_workers=None, role=None, vpc_config_override=None, entry_point=None, source_dir=None, dependencies=None, image_uri=None, **kwargs ):
and erasing it from the arguments of the super parameter:
super(CustomFramework, self).__init__( entry_point, source_dir, hyperparameters, **kwargs )
Hopefully it works for you!

sofianhamiti commented 3 years ago

Thanks @nfbalbontin I have adjusted based on your suggestion in the latest commit You can pull the latest changes and feel free to reopen the issue if needed.

aws-samples / amazon-sagemaker-tensorflow-object-detection-api

TrainingImageConfig with TrainingRepositoryAccessMode set to VPC must be provided when using a training image from a private Docker registry #9