aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.16k stars 6.78k forks source link

[Bug Report] #3713

Open acere opened 1 year ago

acere commented 1 year ago

Link to the notebook Train an TensorFlow model with a SageMaker Training Job and track it using SageMaker Experiments Describe the bug When executing the notebook the model training (8th cell in the notebook) fails with

ParamValidationError: Parameter validation failed:
Unknown parameter in ProfilerConfig: "DisableProfiler", must be one of: S3OutputPath, ProfilingIntervalInMilliseconds, ProfilingParameters

Bugs replicated in SageMaker Studio domains in ap-southeast-1 and us-east-2

To reproduce Run the notebook step by step

Logs Error trace:

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: tensorflow-training-2022-12-20-10-27-40-801

---------------------------------------------------------------------------
ParamValidationError                      Traceback (most recent call last)
<ipython-input-8-952c129da21f> in <module>
     30     )
     31 
---> 32     est.fit()

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline_context.py in wrapper(*args, **kwargs)
    270             return _StepArguments(retrieve_caller_name(self_instance), run_func, *args, **kwargs)
    271 
--> 272         return run_func(*args, **kwargs)
    273 
    274     return wrapper

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
   1128 
   1129         experiment_config = check_and_get_run_experiment_config(experiment_config)
-> 1130         self.latest_training_job = _TrainingJob.start_new(self, inputs, experiment_config)
   1131         self.jobs.append(self.latest_training_job)
   1132         if wait:

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs, experiment_config)
   2046         train_args = cls._get_train_args(estimator, inputs, experiment_config)
   2047 
-> 2048         estimator.sagemaker_session.train(**train_args)
   2049 
   2050         return cls(estimator.sagemaker_session, estimator._current_job_name)

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image_uri, algorithm_arn, encrypt_inter_container_traffic, use_spot_instances, checkpoint_s3_uri, checkpoint_local_path, experiment_config, debugger_rule_configs, debugger_hook_config, tensorboard_output_config, enable_sagemaker_metrics, profiler_rule_configs, profiler_config, environment, retry_strategy)
    625             self.sagemaker_client.create_training_job(**request)
    626 
--> 627         self._intercept_create_request(train_request, submit, self.train.__name__)
    628 
    629     def _get_train_request(  # noqa: C901

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in _intercept_create_request(self, request, create, func_name)
   4654             func_name (str): the name of the function needed intercepting
   4655         """
-> 4656         return create(request)
   4657 
   4658 

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in submit(request)
    623             LOGGER.info("Creating training-job with name: %s", job_name)
    624             LOGGER.debug("train request: %s", json.dumps(request, indent=4))
--> 625             self.sagemaker_client.create_training_job(**request)
    626 
    627         self._intercept_create_request(train_request, submit, self.train.__name__)

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    528                 )
    529             # The "self" in this scope is referring to the BaseClient.
--> 530             return self._make_api_call(operation_name, kwargs)
    531 
    532         _api_call.__name__ = str(py_operation_name)

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    922             endpoint_url=endpoint_url,
    923             context=request_context,
--> 924             headers=additional_headers,
    925         )
    926         resolve_checksum_context(request_dict, operation_model, api_params)

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _convert_to_request_dict(self, api_params, operation_model, endpoint_url, context, headers, set_user_agent_header)
    989         )
    990         request_dict = self._serializer.serialize_to_request(
--> 991             api_params, operation_model
    992         )
    993         if not self._client_config.inject_host_prefix:

/opt/conda/lib/python3.7/site-packages/botocore/validate.py in serialize_to_request(self, parameters, operation_model)
    379             )
    380             if report.has_errors():
--> 381                 raise ParamValidationError(report=report.generate_report())
    382         return self._serializer.serialize_to_request(
    383             parameters, operation_model

ParamValidationError: Parameter validation failed:
Unknown parameter in ProfilerConfig: "DisableProfiler", must be one of: S3OutputPath, ProfilingIntervalInMilliseconds, ProfilingParameters

SageMaker Python SDK version: 2.125.0 Boto3 version: 1.26.33

output of pip list:


Package                              Version
------------------------------------ -----------------
absl-py                              1.3.0
aiobotocore                          2.4.1
aiohttp                              3.8.3
aioitertools                         0.11.0
aiosignal                            1.3.1
alabaster                            0.7.12
anaconda-client                      1.7.2
anaconda-project                     0.8.3
ansi2html                            1.8.0
anyio                                3.6.2
argh                                 0.26.2
argon2-cffi                          21.3.0
argon2-cffi-bindings                 21.2.0
asn1crypto                           1.3.0
astroid                              2.12.13
astropy                              4.0
astunparse                           1.6.3
async-timeout                        4.0.2
asynctest                            0.13.0
atomicwrites                         1.3.0
attrs                                22.1.0
autopep8                             1.4.4
autovizwidget                        0.20.0
awscli                               1.27.24
Babel                                2.11.0
backcall                             0.1.0
backports.shutil-get-terminal-size   1.0.0
beautifulsoup4                       4.8.2
bitarray                             1.2.1
bkcharts                             0.2
bleach                               5.0.1
bokeh                                1.4.0
boto                                 2.49.0
boto3                                1.26.33
botocore                             1.29.33
Bottleneck                           1.3.2
brotlipy                             0.7.0
cached-property                      1.5.2
cachetools                           5.2.0
certifi                              2022.9.24
cffi                                 1.15.0
chardet                              3.0.4
charset-normalizer                   2.0.4
Click                                7.0
cloudpickle                          2.2.0
clyent                               1.2.2
colorama                             0.4.3
conda                                22.9.0
conda-package-handling               1.8.1
contextlib2                          0.6.0.post1
cryptography                         38.0.4
cycler                               0.10.0
Cython                               0.29.15
cytoolz                              0.10.1
dash                                 2.7.0
dash-core-components                 2.0.0
dash-html-components                 2.0.0
dash-table                           5.0.0
dask                                 2022.2.0
decorator                            4.4.1
defusedxml                           0.6.0
diff-match-patch                     20181111
dill                                 0.3.6
distributed                          2022.2.0
distro                               1.8.0
docker                               6.0.1
docker-compose                       1.29.2
dockerpty                            0.4.1
docopt                               0.6.2
docutils                             0.16
dparse                               0.6.2
entrypoints                          0.3
et-xmlfile                           1.0.1
fastcache                            1.1.0
fastjsonschema                       2.16.2
filelock                             3.0.12
flake8                               3.7.9
Flask                                1.1.1
flatbuffers                          22.12.6
frozenlist                           1.3.3
fsspec                               2022.11.0
future                               0.18.2
gast                                 0.4.0
gevent                               1.4.0
glob2                                0.7
gmpy2                                2.0.8
google-auth                          2.15.0
google-auth-oauthlib                 0.4.6
google-pasta                         0.2.0
greenlet                             0.4.15
grpcio                               1.51.1
h5py                                 2.10.0
hdijupyterutils                      0.20.0
HeapDict                             1.0.1
html5lib                             1.0.1
hypothesis                           5.5.4
idna                                 2.8
imageio                              2.6.1
imagesize                            1.2.0
importlib-metadata                   4.13.0
intervaltree                         3.0.2
ipykernel                            5.1.4
ipython                              7.34.0
ipython_genutils                     0.2.0
ipywidgets                           7.5.1
isort                                4.3.21
itsdangerous                         1.1.0
jdcal                                1.4.1
jedi                                 0.18.2
jeepney                              0.4.2
Jinja2                               3.1.2
jmespath                             1.0.1
joblib                               0.14.1
json5                                0.9.1
jsonschema                           3.2.0
jupyter                              1.0.0
jupyter_client                       7.4.8
jupyter-console                      6.1.0
jupyter_core                         4.12.0
jupyter-dash                         0.4.2
jupyter-server                       1.23.3
jupyterlab                           1.2.21
jupyterlab-pygments                  0.2.2
jupyterlab-server                    1.0.6
keras                                2.11.0
keyring                              21.1.0
kiwisolver                           1.1.0
lazy-object-proxy                    1.4.3
libarchive-c                         2.8
libclang                             14.0.6
lief                                 0.9.0
llvmlite                             0.39.1
locket                               0.2.0
lxml                                 4.9.1
Markdown                             3.4.1
MarkupSafe                           2.1.1
matplotlib                           3.1.3
matplotlib-inline                    0.1.6
mccabe                               0.6.1
mistune                              0.8.4
mkl-fft                              1.0.15
mkl-random                           1.1.0
mkl-service                          2.3.0
mock                                 4.0.1
more-itertools                       8.2.0
mpmath                               1.1.0
msgpack                              0.6.1
multidict                            6.0.3
multipledispatch                     0.6.0
multiprocess                         0.70.14
nbclassic                            0.4.8
nbclient                             0.7.2
nbconvert                            6.5.4
nbformat                             5.7.0
nest-asyncio                         1.5.6
networkx                             2.4
nltk                                 3.7
nose                                 1.3.7
notebook                             6.5.2
notebook_shim                        0.2.2
numba                                0.56.4
numexpr                              2.7.1
numpy                                1.21.6
numpydoc                             0.9.2
oauthlib                             3.2.2
olefile                              0.46
openpyxl                             3.0.3
opt-einsum                           3.3.0
packaging                            20.1
pandas                               1.3.5
pandocfilters                        1.4.2
parso                                0.8.3
partd                                1.1.0
path                                 13.1.0
pathlib2                             2.3.5
pathos                               0.3.0
pathtools                            0.1.2
patsy                                0.5.1
pep8                                 1.7.1
pexpect                              4.8.0
pickleshare                          0.7.5
Pillow                               9.3.0
pip                                  22.3.1
pkginfo                              1.5.0.1
platformdirs                         2.6.0
plotly                               5.8.2
pluggy                               0.13.1
ply                                  3.11
pox                                  0.3.2
ppft                                 1.7.6.6
prometheus-client                    0.7.1
prompt-toolkit                       3.0.3
protobuf                             3.19.6
protobuf3-to-dict                    0.1.5
psutil                               5.6.7
ptyprocess                           0.6.0
pure-sasl                            0.6.2
py                                   1.11.0
pyarrow                              10.0.1
pyasn1                               0.4.8
pyasn1-modules                       0.2.8
pycodestyle                          2.5.0
pycosat                              0.6.3
pycparser                            2.19
pycrypto                             2.6.1
pycurl                               7.43.0.5
pydocstyle                           4.0.1
pyflakes                             2.1.1
pyfunctional                         1.4.3
Pygments                             2.13.0
PyHive                               0.6.5
pykerberos                           1.2.1
pylint                               2.15.8
pyodbc                               4.0.0-unsupported
pyOpenSSL                            22.1.0
pyparsing                            2.4.6
pyrsistent                           0.15.7
PySocks                              1.7.1
pytest                               5.3.5
pytest-arraydiff                     0.3
pytest-astropy                       0.8.0
pytest-astropy-header                0.1.2
pytest-doctestplus                   0.5.0
pytest-openfiles                     0.4.0
pytest-remotedata                    0.3.2
python-dateutil                      2.8.2
python-dotenv                        0.21.0
python-jsonrpc-server                0.3.4
python-language-server               0.31.7
pytz                                 2019.3
PyWavelets                           1.1.1
pyxdg                                0.26
PyYAML                               6.0
pyzmq                                24.0.1
QDarkStyle                           2.8
QtAwesome                            0.6.1
qtconsole                            4.6.0
QtPy                                 1.9.0
regex                                2022.10.31
requests                             2.28.1
requests-kerberos                    0.12.0
requests-oauthlib                    1.3.1
retrying                             1.3.4
rope                                 0.16.0
rsa                                  4.9
Rtree                                0.9.3
ruamel_yaml                          0.15.87
s3fs                                 0.4.2
s3transfer                           0.6.0
sagemaker                            2.125.0
sagemaker-data-insights              0.3.3
sagemaker-datawrangler               0.3.8
sagemaker-scikit-learn-extension     2.5.0
sagemaker-studio-analytics-extension 0.0.14
sagemaker-studio-sparkmagic-lib      0.1.4
sasl                                 0.2.1
schema                               0.7.5
scikit-image                         0.16.2
scikit-learn                         0.22.1
scipy                                1.4.1
seaborn                              0.10.0
SecretStorage                        3.1.2
Send2Trash                           1.8.0
setuptools                           59.3.0
simplegeneric                        0.8.1
singledispatch                       3.4.0.3
six                                  1.14.0
smclarify                            0.3
smdebug-rulesconfig                  1.0.1
sniffio                              1.3.0
snowballstemmer                      2.0.0
sortedcollections                    1.1.2
sortedcontainers                     2.1.0
soupsieve                            1.9.5
sparkmagic                           0.20.0
Sphinx                               2.4.0
sphinxcontrib-applehelp              1.0.1
sphinxcontrib-devhelp                1.0.1
sphinxcontrib-htmlhelp               1.0.2
sphinxcontrib-jsmath                 1.0.1
sphinxcontrib-qthelp                 1.0.2
sphinxcontrib-serializinghtml        1.1.3
sphinxcontrib-websupport             1.2.0
spyder                               4.0.1
spyder-kernels                       1.8.1
SQLAlchemy                           1.3.13
statsmodels                          0.11.0
sympy                                1.5.1
tables                               3.6.1
tabulate                             0.9.0
tblib                                1.6.0
tenacity                             8.1.0
tensorboard                          2.11.0
tensorboard-data-server              0.6.1
tensorboard-plugin-wit               1.8.1
tensorflow                           2.11.0
tensorflow-estimator                 2.11.0
tensorflow-io-gcs-filesystem         0.29.0
termcolor                            2.1.1
terminado                            0.8.3
testpath                             0.4.4
texttable                            1.6.7
thrift                               0.13.0
thrift-sasl                          0.4.3
tinycss2                             1.2.1
toml                                 0.10.2
tomli                                2.0.1
tomlkit                              0.11.6
toolz                                0.10.0
tornado                              6.2
tqdm                                 4.42.1
traitlets                            5.6.0
typed-ast                            1.5.4
typing_extensions                    4.4.0
ujson                                5.6.0
unicodecsv                           0.14.1
urllib3                              1.26.13
watchdog                             0.10.2
wcwidth                              0.1.8
webencodings                         0.5.1
websocket-client                     0.59.0
Werkzeug                             2.2.2
wheel                                0.34.2
widgetsnbextension                   3.5.1
wrapt                                1.11.2
wurlitzer                            2.0.0
xlrd                                 1.2.0
XlsxWriter                           1.2.7
xlwt                                 1.3.0
yapf                                 0.28.0
yarl                                 1.8.2
zict                                 1.0.0
zipp                                 3.11.0
brianloyal commented 1 year ago

I'm seeing the same thing in us-east-2 with the SKLearn, TensorFlow, and XGBoost estimators as well

Roshrini commented 1 year ago

Is this happening only in studio or for other jobs? This commit: https://github.com/aws/sagemaker-python-sdk/commit/019d5a4b232cd4d287dff35c6a8ba9681ed4c0ca added disable_profiler flag and botocore v1.29.33 seems to have this flag available as well

Roshrini commented 1 year ago

@acere Can you recreate new user and try again?

tongliang11 commented 1 year ago

I got the same error message. Downgrade sagemaker to version 2.123.0 with the following command solved my problem: pip install sagemaker==2.123.0

claytonparnell commented 1 year ago

@acere are you still experiencing this issue? Running that notebook on Studio (Python 3 (Data Science), us-east-2) with sagemaker 2.128.0 right now, I am able to run all cells with no issue.

acere commented 1 year ago

@claytonparnell the problem is still there on older (created before Dec 2022) SM Studio users. There isn't any issue with Studio users created after Dec 22 with any version of PySDK > 2.123.0

boriside commented 1 year ago

Ok, so solution would be to create a new sagemaker studio user now (after Dec 2022)?

orangewise commented 1 year ago

This is solved by using sagemaker==2.123.0, are there plans to fix this in newer versions?

Rizhiy commented 1 year ago

PyTorch 1.13 and py39 are not available in 2.123. Is there an ETA for getting this fixed?

bengruher commented 1 year ago

Creating a new user in the domain and then using sagemaker==2.143.0 worked for me.

adimux commented 1 year ago

I tried the same notebook on the same instance and did not have the issue.

I believe the issue is fixed on the latest Data Science image. Please try to shut down the kernel (from the top menu -> open Kernel -> Shut Down) and try again.

aebulut commented 9 months ago

"Missing required parameter in ProfilerConfig: "S3OutputPath" Unknown parameter in ProfilerConfig: "DisableProfiler", must be one of: S3OutputPath, ProfilingIntervalInMilliseconds, ProfilingParameters"

I have this issue tried all of the suggestions above but none fix the issue!