GoogleCloudPlatform / elixir-runtime

The community-supported runtime for Elixir on Google App Engine.
https://cloud.google.com/elixir
Apache License 2.0
180 stars 14 forks source link

Setting up Ecto with App Engine and Google Cloud SQL #11

Closed jameskerr closed 6 years ago

jameskerr commented 6 years ago

Hello! So happy to hear about Elixir being supported on App Engine.

I'm having trouble connecting to my Cloud SQL instance using ecto on App Engine. I'm running into similar problems as these folks. (https://github.com/elixir-ecto/postgrex/issues/355)

The error I'm getting is:

21:03:19.742 [error] Postgrex.Protocol (#PID<0.1369.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (/cloudsql/hn-alerts:us-west1:hn-alerts-database:5432): non-existing domain - :nxdomain

My prod ecto config is

config :outlet, Outlet.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "${DB_USER}",
  password: "${DB_PASS}",
  database: "${DB_NAME}",
  hostname: "${DB_HOST}",
  pool_size: 15

And my app.yaml is

service: outlet

env: flex

runtime: gs://elixir-runtime/elixir.yaml
runtime_config:
    release_app: outlet

env_variables:
  DB_USER: user
  DB_PASS: secretpassword
  DB_NAME: database_name
  DB_HOST: /cloudsql/hn-alerts:us-west1:hn-alerts-database

beta_settings:
  cloud_sql_instances: hn-alerts:us-west1:hn-alerts-database

A tutorial on Ecto + App Engine + Cloud SQL Postgres would be 🙌

dazuma commented 6 years ago

Thanks for poking at the issue on postgrex! We were holding off on including cloud sql in the tutorials for precisely this reason, but now that the postgrex update is finally released, I'll see about writing one. I'll close this for now, assuming you're not still having issues.

dmin7b5 commented 6 years ago

I thought I would give you an update on this because I have it working. You just need the following, using the latest postgrex which supports "socket":

# mix.exs
{:postgrex, "~> 0.13.5"},

# prod.secret.exs

config :api, Api.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "some-user-name",
  password: "some-pwd",
  database: "some-db-name",
  socket: "/path-to-cloudsql/project-name:region-name:db-instance-name/.s.PGSQL.5432",
  timeout: 7200_000,
  pool_size: 20
pdgonzalez872 commented 6 years ago

@dmin7b5: I'm trying to follow your setup. I have a question: When you say /path-to-cloudsql/, where do you find that?

dazuma commented 6 years ago

@pdgonzalez872 If this is for app engine deployment, use "/cloudsql/project-name:region-name:db-instance-name/.s.PGSQL.5432"

pdgonzalez872 commented 6 years ago

thanks @dazuma

paymog commented 6 years ago

Anybody know why I might get 15:21:54.703 [error] Postgrex.Protocol (#PID<0.1473.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (/cloudsql/crypto-identity:us-central1:crypto-identity-db/.s.PGSQL.5432): no such file or directory - :enoent when I try to follow these instructions? I get that error when I run

$ MIX_ENV=prod mix release --env=prod
$ PORT=8080 _build/prod/rel/server/bin/server foreground

My prod.secrets.exs looks like:

config :server, Server.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: System.get_env("DATABASE_USERNAME") || "postgres",
  password: System.get_env("DATABASE_PASSWORD") || "my-awesome-password",
  database: System.get_env("DATABASE") || "server_prod",
  socket: "/cloudsql/crypto-identity:us-central1:crypto-identity-db/.s.PGSQL.5432",
  timeout: 7200_000,
  pool_size: 15

The database I have exists:

image

And I've tried connecting with gcloud sql connect crypto-identity-db to ensure my IP is whitelisted for local connections.

dazuma commented 6 years ago

@paymog A couple of things.

First, the socket path in your config file ("/cloudsql/crypto-identity:us-central1:crypto-identity-db/.s.PGSQL.5432") is intended for the App Engine runtime environment. That is, when your application starts up on App Engine, App Engine will connect to your database and open a socket at that location, so that your application can use it to connect to your cloud sql database. But this means if you try to run your application locally on your workstation, you don't have App Engine running locally to open that socket for you on your local file system. You will need to do so yourself. You can use the the Cloud SQL Proxy for that purpose.

Second, remember that config files are evaluated at compile time. This means, among other things, it will evaluate environment variables (DATABASE_USERNAME, DATABASE_PASSWORD, and DATABASE) at compile time, not at runtime. The Elixir runtime tries to account for this—for example, if you set any env_variables in your app.yaml, it will set them in the build environment when it builds your application release. But it is something to be aware of, depending on how you manage and set your environment variables.

svoynow commented 6 years ago

I'm having the same issue as @paymog above. I've set up my database configuration according to all the tutorials / comments, but my GAE app can't connect to my Cloud SQL instance. It's in the same project, so my understanding is that it should just work. But the app can't find the socket file on start up. If I SSH to the VM it looks like the socket directory has been created but there is no actual socket file.

my app.yaml looks like this

env: flex
runtime: gs://elixir-runtime/elixir.yaml
runtime_config:
  release_app: adpq
env_variables:
  POSTGRES_USER: postgres
  POSTGRES_PASSWORD: IjpdBOc6jaq86h7u
  POSTGRES_DATABASE: adpq_production
  POSTGRES_SOCKET_PATH: /cloudsql/my-project:us-central1:my-project-db/.s.PGSQL.5432
beta_settings:
  cloud_sql_instances: my-project:us-central1:my-project-db

and my prod.exs looks like this:

config :adpq, Adpq.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: System.get_env("POSTGRES_USER"),
  password: System.get_env("POSTGRES_PASSWORD"),
  database: System.get_env("POSTGRES_DATABASE"),
  socket: System.get_env("POSTGRES_SOCKET_PATH"),
  timeout: 7200_000,
  pool_size: 10    

I've tried it with and without the /.s.PGSQL.5432 because examples seem to vary on that. But in either case I get

[error] Postgrex.Protocol (#PID<0.1451.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect ( /cloudsql/my-project:us-central1:my-project-db/.s.PGSQL.5432): no such file or directory - :enoent
svoynow commented 6 years ago

Ah, a (possibly) relevant discovery. It looks like the socket is being created on one of the two VM instances, but not the other...

dazuma commented 6 years ago

That very much should not be happening. Are you certain you're looking inside the application container (not just the VM) and the socket is present in one VM but not the other?

dazuma commented 6 years ago

@svoynow ^

svoynow commented 6 years ago

Ok, yeah, I wasn't in the app instance. So disregard my last comment For my next experiment I set max_instances to 1 just to keep it simple. When I get a shell in the app container I see my app and the cloud proxy containers both running. I see the socket directory again, but again no socket file.

I recognize this may be a general App Engine / Cloud SQL issue and not specific to this runtime at all, but I'm stumped.

dazuma commented 6 years ago

@svoynow Can you show logs from the sql proxy container? I wonder if there's some permissions issue.

Another thing to try is enable the sqladmin API if it has not been enabled yet. (It has never been completely clear to me whether this API is needed for cloudsql to function.)

dmin7b5 commented 6 years ago

@paymog and @svoynow These are the steps that got everything working for me.

cloudproxy.txt

svoynow commented 6 years ago

@dmin7b5 I'm not totally clear how your instructions translate to the GAE environment. I can connect to my Cloud SQL instance via the cloud proxy from my local environment fine. My GAE VMs are transient and are supposed to do all the proxy set up for me.

svoynow commented 6 years ago

@dazuma I have the sqladmin API enabled. The cloud sql proxy logs look happy.

2018/03/07 18:59:55 Ready for new connections
2018/03/07 18:59:56 Listening on /cloudsql/elixir-futzing:us-central1:elixir-futzing-db/.s.PGSQL.5432 for elixir-fu
tzing:us-central1:elixir-futzing-db
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/07 19:00:00 New connection for "elixir-futzing:us-central1:elixir-futzing-db"

so it appears the proxy can connect to the database instance.

dmin7b5 commented 6 years ago

@dazuma That's great news! Sorry for my confusion, my instructions were for GCE. I'm using a custom image for my transient VMs.

svoynow commented 6 years ago

@dazuma:

I can now see a socket file in my application container, but i'm still getting the same error. Should I expect to see the same output from gcloud app logs tail from my local machine as from docker logs [app container id] on the VM? The former is spewing the errors, the latter is totally silent.

(BTW, thanks for the help/responses. Very appreciated!)

dazuma commented 6 years ago

@svoynow App Engine intercepts the application's output streams so you should use gcloud for its logs, although you need to use docker logs for any of the other containers.

Just to double-check the obvious, does your app.yaml literally say:

  POSTGRES_SOCKET_PATH: /cloudsql/my-project:us-central1:my-project-db/.s.PGSQL.5432

or does it actually use your real project and database name?

svoynow commented 6 years ago

No, it uses my actual project name. Sorry for any confusion. Always good to re-check the obvious..

OK, that's what I was assuming about the logs. I'm pretty comfortable that the application config is correct, as I've used it to connect to the database instance via a local cloud proxy.

dazuma commented 6 years ago

@svoynow Okay, good to know. So what I'm hearing is:

Is that all correct? If so, I'm running out of ideas...

svoynow commented 6 years ago

Yeah. I think the only reasonable assumption to make is that at least one of those things isn't actually true. So probably time to retrace my steps and make sure.

FWIW, the behavior seems to be that the app tries to establish the db connections some number of times, then some sort of healthcheck kicks in and it kills the process (container?) and spins up another one. So the errors are continuous whether or not the app is receiving any requests. In fact I think the requests aren't getting past nginx

dmin7b5 commented 6 years ago

@svoynow since you are retracing your steps...I know you mentioned you have the SQL API enabled. Just in case you have separate projects, one that hosts the DB and one that hosts the application, you actually have to enable the SQL API not in the project that hosts the DB, but in the project that hosts the application. You might also confirm that the cloud sql instance is deployed in the same region as the app.

svoynow commented 6 years ago

I've double-checked everything and it all seems right. So I'm stumped. One additional perplexing piece of data is that if I change the pool_size in my Ecto config, that changes the number of

2018/03/08 17:36:52 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/08 17:36:52 New connection for "elixir-futzing:us-central1:elixir-futzing-db"
2018/03/08 17:36:52 New connection for "elixir-futzing:us-central1:elixir-futzing-db"

lines I see in the logs for the cloud sql proxy container (i.e. if the pool_size is 3, then there are three New Connection lines, if 10, then 10 etc.). Which makes it look like the app is connecting. Yet I get nothing but the

[error] Postgrex.Protocol (#PID<0.1451.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect ( /cloudsql/my-project:us-central1:my-project-db/.s.PGSQL.5432): no such file or directory - :enoent

errors in the application logs (and nothing but 502s from nginx). I'm trying to think of some more troubleshooting/experiments I can do on the VM or app container, but there aren't all that many tools there.

paymog commented 6 years ago

Looks like I'm a bit behind on all this. Thanks for the response @dazuma! Makes sense that the config doesn't work locally without the proxy enabled. I followed @dmin7b5 instructions and they worked! BUT, I had to change prod.secret.exs to have socket: "/run/cloudsql/crypto-identity:us-central1:crypto-identity-db/.s.PGSQL.5432", instead of socket: "/cloudsql/crypto-identity:us-central1:crypto-identity-db/.s.PGSQL.5432", - had to prefix the socket path with /run.

After quite a bit of debugging I found out that I'm an idiot. I was looking at my GAE logs with gcloud app logs tail -s default. I found out yesterday that those logs aren't near-realtime. I was looking at logs from days ago. I think when I originally posted everything was working correctly but I was looking at stale logs from interim deployments. I ended up looking at my logs from the console and saw no connection problems. Thanks for the help everyone!

svoynow commented 6 years ago

:facepalm Yep, same here. I am also an idiot. My connectivity error messages were also out of date. So now I have to figure out what else might be causing my 502s.

AzimLord commented 6 years ago

Hi guys, any update regarding this? Any tutorial that works for Ecto + App Engine + Cloud SQL Postgres 😥

peterromfeldhk commented 6 years ago

https://elixirforum.com/t/ecto-migrations-on-google-cloud/12858/5 seems to have figured out connection.. but anyone figured out migrations yet?

you can run the migrations from your CI runner directly before the deployment as workaround

Yuandong-Chen commented 6 years ago

@peterromfeldhk Check out the distillery hex package for elixir, it has pre start hook function to combine with ecto migrator module (like this: https://hexdocs.pm/distillery/running-migrations.html#content). We could do something extremely dangerous like having one GenServer accepting some code from external and execute it. Since we have migrator module, we could do migration updating without downtime of server. (Theoretically but not practically)

Yuandong-Chen commented 6 years ago

@svoynow I encounter the same problem before. I fix it by enabling Google SQL API. LOL! It may work for you as well.

Will-W commented 6 years ago

I agree that updating the tutorial would be extremely helpful. The steps I took to get things working were:

Enable the CloudSQL Admin API

Add the following to app.yaml:

beta_settings:
    cloud_sql_instances: "my-project:my-zone:my-db"

Update the postgrex dependency in mix.exs to be: {:postgrex, "~> 0.13.5"}

Use the socketdir parameter for the connection (and not hostname or socket or port), so my config in prod.secret.exs looks like:

config :doorstep, MyApp.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "postgres",
  password: "123",
  database: "myapp_prod",
  socket_dir: "/cloudsql/my-project:my-zone:my-db",
  pool_size: 15
kjrocker commented 5 years ago

I followed the recommendation of @Will-W, as well as some permutations of the advice listed here (including adding /run to the socket directory), and it's still not connecting, says the socket file cannot be found.

I'm running into this error when, as part of the runtime build steps, I run mix ecto.setup. I've also confirmed that my currently running app, which was a basically unmodified runtime config, is also spitting out the same error to the logs every few seconds.

peterromfeldhk commented 5 years ago

@kjrocker its a while when i tried it out but it looks like thats what i did to make it work:

config :example, Example.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "postgres",
  password: "${DB_PASS}",
  database: "example",
  socket: "/cloudsql/${GC_PROJECT_ID}:${GC_REGION}:${GC_SQL_ID}/.s.PGSQL.5432",
  timeout: 7200_000,
  pool_size: 15
Will-W commented 5 years ago

@kjrocker The config I've written has been working for me in production from the date of that comment until now. Two suggestions:

1) Have you enabled the SQL admin console mentioned above? I did that step earlier in the process, it may be important.

2) If you SSH into the running instance (which you can do through the 'instances' page of App Engine on cloud console), have the appropriate socket files been created? The SSH connection should put you into the base machine with a series of docker containers doing the various bits (one will have you application in). You should be able to find the socket files and see them be mapped into your app container. If they're there then it's your ecto config, if they're not then it's something in the app engine and/or cloud sql config.