Open yixu34 opened 4 years ago
Hi, @yixu34! Thanks for reaching out.
According to the logs, I can see that apparently modeldb-postgresql
is not present as a service. Could you run kubectl get svc --namespace <your namespace>
to check? Related to modeldb, you should see something like the below (this is from a fresh install I did this morning with the charts):
$ k get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 10h
modeldb-backend ClusterIP 10.96.36.191 <none> 8085/TCP,8086/TCP,3000/TCP 10h
modeldb-graphql ClusterIP 10.96.212.125 <none> 3000/TCP 10h
modeldb-postgresql ClusterIP 10.96.122.16 <none> 5432/TCP 10h
modeldb-postgresql-headless ClusterIP None <none> 5432/TCP 10h
modeldb-webapp ClusterIP 10.96.65.187 <none> 3000/TCP 10h
I imagine maybe something was off during the installation. If you could share the services you have, I can help you debug what happened.
Here are my services:
$ kubectl get svc | grep modeldb
modeldb-staging-f8780e-backend ClusterIP 100.71.14.248 <none> 8085/TCP,8086/TCP,3000/TCP 150m
modeldb-staging-f8780e-graphql ClusterIP 100.67.143.195 <none> 3000/TCP 150m
modeldb-staging-f8780e-postgresql ClusterIP 100.68.84.251 <none> 5432/TCP 150m
modeldb-staging-f8780e-postgresql-headless ClusterIP None <none> 5432/TCP 150m
modeldb-staging-f8780e-webapp ClusterIP 100.68.70.21 <none> 3000/TCP 150m
I think I might see what the problem is, then: it looks like modeldb-postgresql
is a hardcoded value, by way of the backend
values.yaml, on lines 67 and 81. I suppose I can either install the helm chart with with --name modeldb
, or change those values. Ideally, one would have a dependency on the other, or read from some common place, right?
Ok, changing --name modeldb
for the release did the trick. I've port forwarded the webapp to my localhost:3000
, and I think the only remaining problem is that on the 'Repositories' page, I see a 504 error to http://localhost:3000/api/v1/graphql/query. My guess is that there's a typo here with the double dash. It seems like it should be value: "modeldb-backend:8085"
instead of value: "modeldb--backend:8085"
. I can contribute a PR if that's the case.
Nice catch. That does look like a typo and you are right that some of the names appear to be hardcoded (both in the DB reference and the graphql config). It should be based off the name of the release everywhere to avoid this situation. I'd definitely appreciate a PR with fixes!
Ok cool, but let me make sure I have everything working first 😅 In addition to removing the double dash, I had to move the {{- if .Values.env }}
on line 36 to below line 41. I noticed that this was preventing the MDB_ADDRESS
and QUERY_PATH
environment variables from even being set. I then went back to the 'Repositories' page, which then fires off a request to http://localhost:3000/api/v1/graphql/query. I still see a 504, with the error being Error occured while trying to proxy to: modeldb-backend:3000/query
. I'm not sure why this is happening, because the webapp redirects all api/v1/graphql/
routes to the graphQL service. The graphQL service then uses MDB_ADDRESS
, which I've now (correctly?) set to modeldb-backend:8085
. So I'm not sure why it's trying to forward the request to port 3000 instead. Here are the environment variables when I describe the graphQL pod, by the way:
Environment:
MDB_ADDRESS: modeldb-backend:8085
QUERY_PATH: /api/v1/graphql/query
Ok, I think I was able to narrow down what happened.
First, the webapp logs were a bit misleading because BACKEND_API_DOMAIN
was misconfigured. https://github.com/VertaAI/modeldb/pull/853 is fixing that. It doesn't affect correctness, but it does affect the logs in the OSS component.
After that, I noticed that the graphql service was serving on port 4000, but the whole setup assumed it was on port 3000. The reason for this mismatch is that internally our services default to port 3000 for the exposed layer, but we had to move to 4000 to avoid collision on docker compose to simplify things for users. So the deployment template for graphql should have
- name: QUERY_PATH
value: "/api/v1/graphql/query"
+ - name: SERVER_HTTP_PORT
+ value: "3000"
which will set the correct port. This should resolve the issue you're seeing. I'm not sure how I missed that earlier. Could you double check?
I appreciate the help to debug while we open more of our platform! Our SaaS runs with a very specific configuration, so we need to reconsolidate progressively as we keep moving new parts to the open world. Our end to end CI is not fully compatible with the open version, but it's coming!
I've just pulled master and I'm on f8780ecace643dcba180d670d2f7dc5d68451a7f (this was after the helm chart split, which I noticed in some of the
2.x
tagged versions previously). I tried installing on our k8s cluster via helm, and thebackend
container of the backend service is giving an error:Is there something that's not working out of the box with the helm charts? Or did I not configure the secrets correctly? All I did was
helm install modeldb-staging-f8780e . --namespace <our namespace>
. Thanks!