aws-samples / eks-kubeflow-workshop

Kubeflow workshop on EKS. Mainly focus on AWS integration examples. Please go check kubeflow website http://kubeflow.org for other examples
Apache License 2.0
96 stars 56 forks source link

02_01_fairing_introduction ModuleNotFoundError: No module named 'tensorflow.python.util.module_wrapper' #72

Open dalbhanj opened 3 years ago

dalbhanj commented 3 years ago

Notebook Which notebook you have problem with? 02_01_fairing_introduction

Describe the bug I'm getting an error when I run remote_train() with latest notebook server image (527798164940.dkr.ecr.us-west-2.amazonaws.com/tensorflow-1.15.2-notebook-cpu:1.0.0)

Here are the logs from notebook

[I 200820 20:51:08 config:127] Using builder: <kubeflow.fairing.builders.append.append.AppendBuilder object at 0x7f678c30c940>
[I 200820 20:51:08 config:129] Using deployer: <kubeflow.fairing.deployers.job.job.Job object at 0x7f678c30c898>
[W 200820 20:51:08 append:50] Building image using Append builder...
[I 200820 20:51:08 base:107] Creating docker context: /tmp/fairing_context_tclglgok
[W 200820 20:51:08 base:94] /usr/local/lib/python3.6/dist-packages/kubeflow/fairing/__init__.py already exists in Fairing context, skipping...
[I 200820 20:51:08 docker_creds_:234] Loading Docker credentials for repository 'tensorflow/tensorflow:1.14.0-py3'
[W 200820 20:51:09 append:54] Image successfully built in 0.9497657120227814s.
[W 200820 20:51:09 append:94] Pushing image 896501016854.dkr.ecr.us-east-2.amazonaws.com/fairing-job:FF1A9744...
[I 200820 20:51:09 docker_creds_:234] Loading Docker credentials for repository '896501016854.dkr.ecr.us-east-2.amazonaws.com/fairing-job:FF1A9744'
[W 200820 20:51:09 append:81] Uploading 896501016854.dkr.ecr.us-east-2.amazonaws.com/fairing-job:FF1A9744
[I 200820 20:51:09 docker_session_:284] Layer sha256:9713d805412120307d9ed876c844ed5316225d921a0a73147fcd0fac3b15fbed pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:5bd1cb59702536c10e96bb14e54846922c9b257580d4e2c733076a922525240b pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:2b940936f9933b7737cf407f2149dd7393998d7a0bee5acf1c4a57b0487cef79 pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:a31c3b1caad473a474d574283741f880e37c708cc06ee620d3e93fa602125ee0 pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:b054a26005b7f3b032577f811421fab5ec3b42ce45a4012dfa00cf6ed6191b0f pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:14ca88e9f6723ce82bc14b241cda8634f6d19677184691d086662641ab96fe68 pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:04d910722b343d5d640ec44c90766d5eb6f0b03482927b88c84a6bb7820618aa pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:68543864d6442a851eaff0500161b92e4a151051cf7ed2649b3790a3f876bada pushed.
[I 200820 20:51:10 docker_session_:284] Layer sha256:5e671b828b2af02924968841e5d12084fa78e8722e9510402aaee80dc5d7a6db pushed.
[I 200820 20:51:12 docker_session_:284] Layer sha256:5b7339215d1d5f8e68622d584a224f60339f5bef41dbd74330d081e912f0cddd pushed.
[I 200820 20:51:23 docker_session_:284] Layer sha256:8832e37735788665026956430021c6d1919980288c66c4526502965aeb5ac006 pushed.
[I 200820 20:51:30 docker_session_:284] Layer sha256:016724bbd2c9643f24eff7c1e86d9202d7c04caddd7fdd4375a77e3998ce8203 pushed.
[I 200820 20:51:31 docker_session_:334] Finished upload of: 896501016854.dkr.ecr.us-east-2.amazonaws.com/fairing-job:FF1A9744
[W 200820 20:51:31 append:99] Pushed image 896501016854.dkr.ecr.us-east-2.amazonaws.com/fairing-job:FF1A9744 in 21.633319832151756s.
[W 200820 20:51:31 job:90] The job fairing-job-zxns4 launched.
[W 200820 20:51:31 manager:255] Waiting for fairing-job-zxns4-vphfz to start...
[W 200820 20:51:31 manager:255] Waiting for fairing-job-zxns4-vphfz to start...
[W 200820 20:51:31 manager:255] Waiting for fairing-job-zxns4-vphfz to start...
[I 200820 20:51:54 manager:261] Pod started running True
Traceback (most recent call last):
  File "/app/function_shim.py", line 79, in <module>
    call(args.serialized_fn_file)
  File "/app/function_shim.py", line 61, in call
    obj = cloudpickle.load(f)
ModuleNotFoundError: No module named 'tensorflow.python.util.module_wrapper'
[W 200820 20:51:56 job:162] Cleaning up job fairing-job-zxns4...

Expected behavior A clear and concise description of what you expected to happen. The notebook should run without issues

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

dalbhanj commented 3 years ago

After changing the tensorflow image to 1.15, the remote fairing job completed successfully fairing.config.set_builder('append', base_image='tensorflow/tensorflow:1.15.0-py3', registry=DOCKER_REGISTRY, push=True) I'll send a PR shortly