Closed bolaft closed 2 years ago
@bolaft - I have updated the notebook. Basically, we need to use Python 2.7 instead of 3.5
Could you please try the updated notebook?
@ksalama I tried the updated notebook, I still have an error on the same step, but a different one:
DEBUG: Running [gcloud.beta.ai-platform.versions.create] with arguments: [--framework: "scikit-learn", --machine-type: "mls1-c4-m4", --model: "torch_text_classification", --origin: "gs://b4nlp_bucket/torch_text_classification/models/", --package-uris: "[u'gs://b4nlp_bucket/torch_text_classification/packages/my_package-0.1.tar.gz']", --prediction-class: "model_prediction.CustomModelPrediction", --python-version: "2.7", --runtime-version: "1.12", --verbosity: "debug", VERSION: "v201903"]
DEBUG: (gcloud.beta.ai-platform.versions.create) Internal error.
Traceback (most recent call last):
File "/tools/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 985, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/tools/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 795, in Run
resources = command_instance.Run(args)
File "/tools/google-cloud-sdk/lib/surface/ai_platform/versions/create.py", line 158, in Run
package_uris=args.package_uris)
File "/tools/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 113, in Create
message='Creating version (this might take a few minutes)...')
File "/tools/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 74, in WaitForOpMaybe
return operations_client.WaitForOperation(op, message=message).response
File "/tools/google-cloud-sdk/lib/googlecloudsdk/api_lib/ml_engine/operations.py", line 114, in WaitForOperation
sleep_ms=5000)
File "/tools/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 264, in WaitFor
sleep_ms, _StatusUpdate)
File "/tools/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 326, in PollUntilDone
sleep_ms=sleep_ms)
File "/tools/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 229, in RetryOnResult
if not should_retry(result, state):
File "/tools/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 320, in _IsNotDone
return not poller.IsDone(operation)
File "/tools/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 122, in IsDone
raise OperationError(operation.error.message)
OperationError: Internal error.
ERROR: (gcloud.beta.ai-platform.versions.create) Internal error.
Any news? I tried it again and consistently get an "internal error". Did the updated notebook work on your end?
Hi sorry about this and the miscommunication.
Working to resolve this issue and clear things up.
The high level notebooks
folder is intended for samples to be run on AI Platform Notebooks and we do not guarantee that these samples will work on CoLab.
I've added this PR to help clarify this and will be looking at ways to make this clearer across the repo. https://github.com/GoogleCloudPlatform/cloudml-samples/pull/416
@ksalama if you intend for it to work with both, we should talk about opening a highlevel colab
directory or moving the samples to ML on GCP
@nnegrey In fact, it is an issue with the Beta feature of Custom Prediction Routines (rather than running it in Colab). I raised an internal bug and the engineering team is looking into it.
Hi, any updates on this? I am facing the same issue as OP and cannot serve pytorch model with Platform AI.
I noticed some changes were made in the backend, now you can't use both "--prediction-class" and "--framework" options when deploying a version.
Even after removing the "--framework" option from the notebook, the sample still fails, both on AI platform and on Colab. It gives an "internal error occured" message.
We are investigating this issue internally, will provide an update soon.
Did anyone find a work-around for this?
I noticed some changes were made in the backend, now you can't use both "--prediction-class" and "--framework" options when deploying a version.
Even after removing the "--framework" option from the notebook, the sample still fails, both on AI platform and on Colab. It gives an "internal error occured" message.
Facing the very same issue here.
Hi folks, still no update on why this is happening or what is causing it.
However, you can also jump to this issue here: https://issuetracker.google.com/issues/132823509 (UPDATED to correct link) Add your +1 and similar experience there.
I don't have access to this site. Is there any news yet, or maybe someone found a workaround?
Thanks in advance!
Hi folks, still no update on why this is happening or what is causing it.
However, you can also jump to this issue here: https://b.corp.google.com/issues/132823509 Add your +1 and similar experience there.
That site seems blocked, or is for google employees only
Oops sorry. It should be https://issuetracker.google.com/issues/132823509
From this page: https://cloud.google.com/support/docs/issue-trackers
I tried running this sample notebook in Colab.
I made some updates to this model deployment command:
--framework
flag per the documentation!gcloud beta ai-platform versions create {VERSION_NAME} --model {MODEL_NAME} \
--origin=gs://{BUCKET}/{MODEL_DIR}/ \
--python-version=3.5 \
--runtime-version={RUNTIME_VERSION} \
--package-uris=gs://{BUCKET}/{PACKAGES_DIR}/my_package-0.1.tar.gz \
--machine-type=mls1-c4-m2 \
--prediction-class=model_prediction.CustomModelPrediction
After taking a moment to update my VERSION_NAME variable for another attempt, I tried running it without the --machine-type
flag, in which case it defaults to using the single-core CPU:
!gcloud beta ai-platform versions create {VERSION_NAME} --model {MODEL_NAME} \
--origin=gs://{BUCKET}/{MODEL_DIR}/ \
--python-version=3.5 \
--runtime-version={RUNTIME_VERSION} \
--package-uris=gs://{BUCKET}/{PACKAGES_DIR}/my_package-0.1.tar.gz \
--prediction-class=model_prediction.CustomModelPrediction
In both cases I got the following error:
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: Model requires more memory than allowed. Please try to decrease the model size and re-deploy. If you continue to have error, please contact Cloud ML.
When issue 456 is closed, this issue can be closed as well.
You need to use compiled packages compatible with Cloud AI Platform Package information here
This bucket containers compiled packages for PyTorch that are compatible with Cloud AI Platform prediction. The files are mirrored from the official builds at https://download.pytorch.org/whl/cpu/torch_stable.html
In order to deploy a PyTorch model on Cloud AI Platform Online Predictions, you must add one of these packages to the packageURIs field on the version you deploy. Pick the package matching your Python and PyTorch version. The package names follow this template:
Package name = torch-{TORCH_VERSION_NUMBER}-{PYTHON_VERSION}-linux_x86_64.whl where PYTHON_VERSION = cp35-cp35m for Python 3 with runtime versions < 1.15, cp37-cp37m for Python 3 with runtime versions >= 1.15
Use cp27-cp27mu for Python 2.
For example, if I were to deploy a PyTorch model based on PyTorch 1.1.0 and Python 3, my gcloud command would look like:
gcloud beta ai-platform versions create {VERSION_NAME} --model {MODEL_NAME} \
...
--package-uris=gs://{MY_PACKAGE_BUCKET}/my_package-0.1.tar.gz,gs://cloud-ai-pytorch/torch-1.1.0-cp35-cp35m-linux_x86_64.whl
my_package
!gcloud beta ai-platform versions create {VERSION_NAME} --model {MODEL_NAME} \
--origin=gs://{BUCKET}/{MODEL_DIR}/ \
--python-version=3.7 \
--runtime-version={RUNTIME_VERSION} \
--package-uris=gs://{BUCKET}/{PACKAGES_DIR}/text_classification-0.1.tar.gz,gs://cloud-ai-pytorch/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl \
--machine-type=mls1-c4-m4 \
--prediction-class=model_prediction.CustomModelPrediction
@bolaft we updated the notebooks with changes, PTAL
@bolaft - please review change
This issue may no longer be relevant due to its age. Please feel free to re-open.
Describe the bug
The notebook
cloudml-samples/notebooks/pytorch/Text Classification Using PyTorch and CMLE.ipynb
does not seem to work on Google Colab.In the following command:
It fails because "model-class" is not a recognized argument. I believe it should be "prediction-class".
After changing "model-class" for "prediction-class", the command fails again with the following error message :
What sample is this bug related to?
cloudml-samples/notebooks/pytorch/Text Classification Using PyTorch and CMLE.ipynb
Source code / logs
When adding the
--verbosity debug
argument:To Reproduce Steps to reproduce the behavior:
cloudml-samples/notebooks/pytorch/Text Classification Using PyTorch and CMLE.ipynb
notebookExpected behavior
The model version should be correctly deployed.
System Information