caicloud / ciao

Kernel for Kubeflow in Jupyter Notebook
Apache License 2.0
67 stars 18 forks source link

A Problem #85

Open gogogwwb opened 3 years ago

gaocegege commented 3 years ago

[kubeflow][worker-0] urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

Seems that there is a problem about network connection.

gogogwwb commented 3 years ago

The network should be normal.

gogogwwb commented 3 years ago

I don’t know why " [kubeflow][worker-0] -----> Running code... " does not appear

gaocegege commented 3 years ago

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

yann.lecun.com is blocked in China.

gogogwwb commented 3 years ago

[kubeflow] Building the Docker image... [kubeflow] Image built successfully [kubeflow] Getting tensorflow Job jupyter-kernel-nxeof [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) [kubeflow] Waiting for all replicas (0, 1, 1) Tried 10 times but cannot get the pods Job jupyter-kernel-nxeof is created.

Pods are normal.It doesn't seem to show the kubeflow log.I don't know why.

gogogwwb commented 3 years ago

Do I need to install kubeflow locally when using Dockerized Kernel?

gaocegege commented 3 years ago

I think so. You need to install tf-operator at least

gogogwwb commented 3 years ago

But if this is the case, what are the advantages of CIAO? If Kubeflow is installed, why not use the built-in Jupyter Notebook?

gogogwwb commented 3 years ago

I'm a little confused.

gaocegege commented 3 years ago

with ciao you can write code and run it distributedly in the notebook.

gaocegege commented 3 years ago

If you do not want to run the training job distributedly in the notebook, you can use kubeflow jupyter or https://github.com/tkestack/elastic-jupyter-operator

gogogwwb commented 3 years ago

OK,thanks!

[kubeflow] Building the Docker image... [kubeflow] Image built successfully [kubeflow] Getting tensorflow Job jupyter-kernel-nxeof [kubeflow] Waiting for all replicas (0, 1, 1) ........ Tried 10 times but cannot get the pods Job jupyter-kernel-nxeof is created.

Is there a version issue?

gaocegege commented 3 years ago

I am not sure, Can you get the TFJob in the cluster?

gogogwwb commented 3 years ago

yes,It seems that the log cannot be obtained.Is it a label problem?

gaocegege commented 3 years ago

Maybe, I am not sure about it. Is there any log from the kernel?

gaocegege commented 3 years ago

Maybe it is related to labels. TFJob's labels are changed in the latest version.