GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.52k stars 857 forks source link

"AttributeError: 'SymbolDatabase' object has no attribute 'RegisterServiceDescriptor'" #99

Closed luckyapplehead closed 6 years ago

luckyapplehead commented 6 years ago

For problems running the sample code please provide the following information.

System information

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in the sample code, and whether it occurred locally or on the server (CloudML Engine, Google Dataflow) I run the code and the command the same as the guide line, to train the DNN model on the server -- Google CloudML Engine. But the error occur as below: "AttributeError: 'SymbolDatabase' object has no attribute 'RegisterServiceDescriptor'" Is there anything wrong?

The packages for my environment are as below: gapic-google-cloud-pubsub-v1==0.15.4 google-apitools==0.5.16 google-auth==1.1.1 google-auth-httplib2==0.0.2 google-cloud-bigquery==0.27.0 google-cloud-core==0.27.1 google-cloud-dataflow==2.1.1 google-cloud-pubsub==0.28.4 google-gax==0.15.15 google-resumable-media==0.3.1 googleapis-common-protos==1.5.3 googledatastore==7.0.1 grpc-google-iam-v1==0.11.4 proto-google-cloud-datastore-v1==0.90.4 proto-google-cloud-pubsub-v1==0.15.4 protobuf==3.4.0 six==1.11.0

Source code / logs

ERROR 2017-10-22 23:07:06 +0800 ps-replica-0 Command '['python', '-m', u'trainer.task', u'--raw_metadata_path', u'gs://dataproc-1228d533-ffe2-4747-a056-8cd396c3db5f-asia-southeast1/movielens/movielens_20171022_175327/raw_metadata', u'--transform_savedmodel', u'gs://dataproc-1228d533-ffe2-4747-a056-8cd396c3db5f-asia-southeast1/movielens/movielens_20171022_175327/transform_fn', u'--eval_data_paths', u'gs://dataproc-1228d533-ffe2-4747-a056-8cd396c3db5f-asia-southeast1/movielens/movielens_20171022_175327/features_eval*.tfrecord.gz', u'--train_data_paths', u'gs://dataproc-1228d533-ffe2-4747-a056-8cd396c3db5f-asia-southeast1/movielens/movielens_20171022_175327/features_train*.tfrecord.gz', u'--output_path', u'gs://dataproc-1228d533-ffe2-4747-a056-8cd396c3db5f-asia-southeast1/movielens/model/movielens_deep_20171022_230112', u'--model_type', u'dnn_softmax', u'--eval_type', u'ranking', u'--l2_weight_decay', u'0.01', u'--learning_rate', u'0.05', u'--train_steps', u'500000', u'--eval_steps', u'500', u'--top_k_infer', u'100']' returned non-zero exit status 1 INFO 2017-10-22 23:07:06 +0800 ps-replica-0 Module completed; cleaning up. INFO 2017-10-22 23:07:06 +0800 ps-replica-0 Clean up finished. ERROR 2017-10-22 23:07:55 +0800 service The replica ps 0 exited with a non-zero status of 1. Termination reason: Error. ERROR 2017-10-22 23:07:55 +0800 service Traceback (most recent call last): ERROR 2017-10-22 23:07:55 +0800 service [...] ERROR 2017-10-22 23:07:55 +0800 service File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3138, in <module> ERROR 2017-10-22 23:07:55 +0800 service @_call_aside ERROR 2017-10-22 23:07:55 +0800 service File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3122, in _call_aside ERROR 2017-10-22 23:07:55 +0800 service f(*args, **kwargs) ERROR 2017-10-22 23:07:55 +0800 service File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3166, in _initialize_master_working_set ERROR 2017-10-22 23:07:55 +0800 service for dist in working_set ERROR 2017-10-22 23:07:55 +0800 service File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 3166, in <genexpr> ERROR 2017-10-22 23:07:55 +0800 service for dist in working_set ERROR 2017-10-22 23:07:55 +0800 service File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2697, in activate ERROR 2017-10-22 23:07:55 +0800 service declare_namespace(pkg) ERROR 2017-10-22 23:07:55 +0800 service File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2257, in declare_namespace ERROR 2017-10-22 23:07:55 +0800 service _handle_ns(packageName, path_item) ERROR 2017-10-22 23:07:55 +0800 service File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2192, in _handle_ns ERROR 2017-10-22 23:07:55 +0800 service loader.load_module(packageName) ERROR 2017-10-22 23:07:55 +0800 service File "/usr/lib/python2.7/pkgutil.py", line 246, in load_module ERROR 2017-10-22 23:07:55 +0800 service mod = imp.load_module(fullname, self.file, self.filename, self.etc) ERROR 2017-10-22 23:07:55 +0800 service File "/root/.local/lib/python2.7/site-packages/google/cloud/pubsub/__init__.py", line 30, in <module> ERROR 2017-10-22 23:07:55 +0800 service from google.cloud.pubsub.client import Client ERROR 2017-10-22 23:07:55 +0800 service File "/root/.local/lib/python2.7/site-packages/google/cloud/pubsub/client.py", line 29, in <module> ERROR 2017-10-22 23:07:55 +0800 service from google.cloud.pubsub._gax import _PublisherAPI as GAXPublisherAPI ERROR 2017-10-22 23:07:55 +0800 service File "/root/.local/lib/python2.7/site-packages/google/cloud/pubsub/_gax.py", line 19, in <module> ERROR 2017-10-22 23:07:55 +0800 service from google.cloud.gapic.pubsub.v1.publisher_client import PublisherClient ERROR 2017-10-22 23:07:55 +0800 service File "/root/.local/lib/python2.7/site-packages/google/cloud/gapic/pubsub/v1/publisher_client.py", line 37, in <module> ERROR 2017-10-22 23:07:55 +0800 service from google.iam.v1 import iam_policy_pb2 ERROR 2017-10-22 23:07:55 +0800 service File "/root/.local/lib/python2.7/site-packages/google/iam/v1/iam_policy_pb2.py", line 296, in <module> ERROR 2017-10-22 23:07:55 +0800 service _sym_db.RegisterServiceDescriptor(_IAMPOLICY) ERROR 2017-10-22 23:07:55 +0800 service AttributeError: 'SymbolDatabase' object has no attribute 'RegisterServiceDescriptor' ERROR 2017-10-22 23:07:55 +0800 service ERROR 2017-10-22 23:07:55 +0800 service To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=874568011889&resource=ml_job%2Fjob_id%2Fmovielens_deep_20171022_230112&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22movielens_deep_20171022_230112%22 INFO 2017-10-22 23:09:20 +0800 service Finished tearing down TensorFlow. INFO 2017-10-22 23:10:21 +0800 service Job failed.

parthmishra commented 6 years ago

What worked for me was manually specifying the version as it wasn't the case that the version of protobuf being used in the interpreter was the same on the server. I'm not sure if you have specified the version explicitly or not but start there.

luckyapplehead commented 6 years ago

Hi, @parthmishra Parthmishra, could you tell me how to manually specifying the version of protobuf used on the server?

parthmishra commented 6 years ago

In your setup.py just list the version number afterwards i.e. protobuf==3.4.0 instead of protobuf (or change your requirements.txt if you're using that instead) If you did that and it's still not working, I'm not totally sure.

luckyapplehead commented 6 years ago

It works for me. Thanks Parthmishra:)

puneith commented 6 years ago

Thanks @luckyapplehead