Open gogogwwb opened 3 years ago
The network should be normal.
I don’t know why " [kubeflow][worker-0] -----> Running code... " does not appear
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
yann.lecun.com is blocked in China.
[kubeflow] Building the Docker image... [kubeflow] Image built successfully [kubeflow] Getting tensorflow Job jupyter-kernel-nxeof [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) Tried 10 times but cannot get the pods Job jupyter-kernel-nxeof is created.
Pods are normal.It doesn't seem to show the kubeflow log.I don't know why.
Do I need to install kubeflow locally when using Dockerized Kernel?
I think so. You need to install tf-operator at least
But if this is the case, what are the advantages of CIAO? If Kubeflow is installed, why not use the built-in Jupyter Notebook?
I'm a little confused.
with ciao you can write code and run it distributedly in the notebook.
If you do not want to run the training job distributedly in the notebook, you can use kubeflow jupyter or https://github.com/tkestack/elastic-jupyter-operator
OK,thanks!
[kubeflow] Building the Docker image... [kubeflow] Image built successfully [kubeflow] Getting tensorflow Job jupyter-kernel-nxeof [kubeflow] Waiting for all replicas (0, 1, 1) ........ Tried 10 times but cannot get the pods Job jupyter-kernel-nxeof is created.
Is there a version issue?
I am not sure, Can you get the TFJob in the cluster?
yes,It seems that the log cannot be obtained.Is it a label problem?
Maybe, I am not sure about it. Is there any log from the kernel?
Maybe it is related to labels. TFJob's labels are changed in the latest version.
Seems that there is a problem about network connection.