SeldonIO / seldon-server

Machine Learning Platform and Recommendation Engine built on Kubernetes
https://www.seldon.io/
Apache License 2.0
1.47k stars 300 forks source link

Error with seldon-cli, does kubectl 1.6.2 supported? #52

Closed parkerzf closed 7 years ago

parkerzf commented 7 years ago

Do you have the following error running seldon-cli?

error: expected 'exec POD_NAME COMMAND [ARG1] [ARG2] ... [ARGN]'. POD_NAME and COMMAND are required arguments for the exec command See 'kubectl exec -h' for help and examples.

Is 1.6.2 supported or should I use the older version?

my kubectl version is: Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:33:11Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

ukclivecox commented 7 years ago

Can you provide the seldon-cli command and surrounding context you are executing to get the error? Yes 1.6.2 should be supported.

parkerzf commented 7 years ago

Thanks for the quick response.

I try to follow http://docs.seldon.io/ml10m.html in AWS.

The seldon is running and kubectl create -f ml10m-import-item-similarity.json is finished.

However, when I run seldon-cli api --client-name ml10m --endpoint /js/recommendations --item 50 --limit 4 --user 625 this error massage shows.

BTW, the spark cannot start because of out of memory. Could it be the reason?

kubectl describe po spark-worker-controller-3381690000-pwfrd

No nodes are available that match all of the following predicates:: Insufficient memory (1), PodToleratesNodeTaints (1)

ukclivecox commented 7 years ago

Yes. the ml10m needs Spark to create the model. Can you give yourself more memory? You will probably require 10G.

parkerzf commented 7 years ago

Cool, I will try a larger AWS instance and let you know.

parkerzf commented 7 years ago

I update to a larger machine and the seldon-cli works fine. However, the output of seldon-cli api --client-name ml10m --endpoint /js/recommendations --item 50 --limit 4 --user 625

becomes response code 200 {"error_id":37,"error_msg":"Invalid or Null Strategy. The default strategy or a per client strategy needs to be set correctly.","http_response":500}

Do you know what could be the reason? @cliveseldon

ukclivecox commented 7 years ago

Have you run the job that trains and sets things up?:

cd seldon-server/kubernetes/conf/examples/ml10m kubectl create -f ml10m-import-item-similarity.json

Also, did it complete successfully as shown by: kubectl get jobs -l job-name=ml10m-import

You need to wait for this to complete which may take 10 or more minutes depending on your compute capacity.

parkerzf commented 7 years ago

Thanks! I ran the kubectl create -f ml10m-import-item-similarity.json but it didn't finish. I think this time it is something wrong with the GlusterFS.

kubectl describe pods ml10m-import-spwgv

Evicted The node was low on resource: nodefs Killing Killing container with id docker://f1ccb62af6ee4c30cd55b76175bf301adddf0f02d1062130eb5da64aaf13ab12:Need to kill pod

parkerzf commented 7 years ago

I test the ml100k job, and it works fine. Thanks again!

ukclivecox commented 7 years ago

Yes, the ml10m will take up more memory so that could be the issue.