Closed orriduck closed 2 years ago
Hello @ruyyi0323
It looks that source_dir="../ds_pipeline/src",
should be source_dir="./ds_pipeline/src",
You can look at this example: https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/tensorflow_script_mode_debug_local_training/tensorflow_script_mode_debug_local_training.py
Hi @eitansela ,
It raises error
Couldn't call 'get_role' to get Role ARN from role name BGTDevSageMakerAdmin to get Role path.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-67b9be0e75b7> in <module>
39 "encoders": TrainingInput(
40 s3_data="s3://sagemaker-project-p-zfuf9hgaujxu/experiment_packs/poc_exp/feature_engineering/encoders",
---> 41 content_type=None,
42 ),
43 }
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
652
653 """
--> 654 self._prepare_for_training(job_name=job_name)
655
656 self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in _prepare_for_training(self, job_name)
2164 # source directory. We are intentionally not handling it because this is a critical error.
2165 if self.source_dir and not self.source_dir.lower().startswith("s3://"):
-> 2166 validate_source_dir(self.entry_point, self.source_dir)
2167
2168 # if we are in local mode with local_code=True. We want the container to just
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/fw_utils.py in validate_source_dir(script, directory)
77 if not os.path.isfile(os.path.join(directory, script)):
78 raise ValueError(
---> 79 'No file named "{}" was found in directory "{}".'.format(script, directory)
80 )
81
ValueError: No file named "train.py" was found in directory "./ds_pipeline/src".
As a reference, here are the file tree, I am using blue arrow pointed notebook to execute the local training job
for convenient your replicate if you want to do so
.
├── build_and_exec_params.json
├── build_and_exec.py
├── build_requirements.txt
├── data_helpers
│ ├── data_acquire.py
│ ├── data_cleanup.py
│ ├── data_prep_guidebook.ipynb
│ ├── __pycache__
│ │ ├── data_acquire.cpython-36.pyc
│ │ └── data_cleanup.cpython-36.pyc
│ └── snowflake.zip
├── downstream_preview
│ ├── sagemaker_endpoint_template.yml
│ └── sagemaker_project_shareside_template.yml
├── ds_pipeline
│ ├── data_evaluation.py
│ ├── data_ingestion.py
│ ├── feature_engineering.py
│ ├── __init__.py
│ ├── model_evaluation.py
│ ├── pipeline.py
│ ├── readme.md
│ ├── requirements.txt
│ └── src
│ ├── encoders.py
│ ├── inference.py
│ ├── nn_model.py
│ ├── predictor.py
│ ├── requirements.txt
│ └── train.py
├── modelbuild_buildspec.yml
├── Project_Report.md
├── README.md
└── sagemaker_modelbuild_project_assistant
├── endpoint_deployment_test.ipynb
├── kill_resources.ipynb
├── modelpackage_injector.ipynb
├── orchestrial_procedure.ipynb
├── resources_killer
│ ├── local_mode_resource_killer.sh
│ ├── model_package_killer.py
│ └── pipeline_killer.py
├── standalone_pipeline_run.ipynb
└── stepscript_templates
├── inference_seed_script.py
├── processing_seed_script.py
├── tensorflow_seed_pipeline.py
└── training_seed_script.py
Using absolute path will crack this issue.
Hi,
I am attempting to initiate a training job using
TensorFlow
with given attributeentry_point
+source_dir
while I am having file not found issueMy file structure is something like this
code snippet I am using to call this training job
error I am getting
Any comment will be helpful, thanks