Closed altruistcoder closed 2 years ago
You may want to look at updating the resources for the controller: see https://github.com/SeldonIO/seldon-core/blob/7a94dbfe354cef2ada1b6a563f7acf66328463df/helm-charts/seldon-core-operator/values.yaml#L81-L85
Hello @cliveseldon ,
It is true that updating the resources is solving the problem but I am facing some difficulties in making this change a permanent change.
Actually, the Controller is installed using the Openshift Seldon Operator and not using Helm. So, I am not sure how to increase its resources permanently because if I try to make a change in seldon-controller-manager
Deployment to increase resources, it shows me this error:
So, can you please tell me how can I make this change permanent by changing it directly into operator itself?
Can you make a change to the resource in the openshift operator for Seldon. @RafalSkolasinski
@altruistcoder On OpenShift you should be able to edit the CSV of the Seldon Operator directly. I believe this would be the section you are after https://github.com/redhat-openshift-ecosystem/certified-operators/blob/main/operators/seldon-operator-certified/1.14.1/manifests/seldon-operator-certified.clusterserviceversion.yaml#L544-L550
@cliveseldon @RafalSkolasinski Yes, I can see this configuration in the CSV of the Seldon Operator in my cluster in lines 681-687.
But, we have many models which are already deployed using Seldon Operator and are being used frequently in multiple namespaces in our cluster. So, will this change be affecting any existing/running model deployments? Or will it affect any webhooks or change any other default setting which might cause an issue?
If Yes, what can I do to avoid that? and if No, can you please tell me since the Operator was installed cluster-wide, I should make the changes in the CSV in the "seldon-operator" namespace only, right?
This should not affect the models but as said you should test on your dev cluster to confirm.
Describe the bug
Hello,
I have been working with Seldon from quite some time and have been able to deploy multiple models using different pre-packaged inference servers provided by Seldon. But, from past two days I haver started facing a problem with the Seldon deployments on my openshift cluster. I am trying create a Seldon Deployment Instance of the Seldon Operator to deploy a Xgboost model using the Xgboost Prepackaged Inference Server. But, I am getting one of the below two errors every time I try to create the respective instance object:
I have also observed that when I try to create this object, the
seldon-controller-manager
pod, is getting into OOMKilled state and then restarts by itself. Although, I am not able to see any identifiable errors in the logs of the pod.Also, I am able to deploy models using the Tensorflow and the Sklean pre-packages Inference Servers.
Can you please help me in resolving the issue as soon as possible as I am not able to deploy my required models due to this?
To reproduce
Deploy any xgboost model using the Xgboost Inference Servers.
Expected behaviour
The Xgboost model should deploy successfully in the openshift cluster.
Environment