fly-apps / terraform-provider-fly

Terraform provider for the Fly.io API
BSD 3-Clause "New" or "Revised" License
113 stars 37 forks source link

Terraform Cloud #42

Closed neilmock closed 2 years ago

neilmock commented 2 years ago

This seems to assume an open WG tunnel during provisioning, I can't think of a way this could work in TF Cloud but might be missing something? Thanks!

mochja commented 2 years ago

@neilmock you could theoretically proxy it if you really have to and then set https://registry.terraform.io/providers/fly-apps/fly/latest/docs#fly_http_endpoin and maybe even you could proxy it with another fly app?

I think #31 will solve this in the future.

DAlperin commented 2 years ago

@neilmock yeah this is a known problem. Like @mochja said there is some work going on that should make this easier soon.

DAlperin commented 2 years ago

@mochja @neilmock alrighty we finally have a beta internal tunnel that removes the need for any proxy stuff. See example on how to enable it:

provider "fly" {
  useinternaltunnel    = true
  internaltunnelorg    = "personal"
  internaltunnelregion = "ewr"
}

This is available in v0.0.17 which is releasing now

hb9cwp commented 2 years ago

@DAlperin @jsiebens Tested v.0.0.18 with fly_app, _ip, _machine stanzas and Terraform CLI v1.2.9 from ChromeOS (Debian amd64) works great so far! Now, I will try with remote Terraform Cloud workspace as well. Thank you very much for this amazing work.

hb9cwp commented 2 years ago

@DAlperin I just noticed that the same .tf always times out while trying to create fly_machines using Terraform CLI v1.3.0 which was just released, whereas it applies fine using Terraform CLI v1.2.9:

fly_app.flyMachines: Creating...
fly_app.flyMachines: Creation complete after 0s [id=fly-machine-hello04]
fly_ip.IPv4: Creating...
fly_ip.IPv6: Creating...
fly_machine.exampleMachine["ams"]: Creating...
fly_machine.exampleMachine["fra"]: Creating...
fly_ip.IPv6: Creation complete after 2s [id=ip_qr702pxnm6v136zd]
fly_ip.IPv4: Creation complete after 3s [id=ip_3jwv94n3kd81p506]
fly_machine.exampleMachine["ams"]: Still creating... [10s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [10s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [20s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [20s elapsed]
^C
Interrupt received.
Please wait for Terraform to exit or data loss may occur.
Gracefully shutting down...

compared to

fly_app.flyMachines: Creating...
fly_app.flyMachines: Creation complete after 0s [id=fly-machine-hello04]
fly_ip.IPv6: Creating...
fly_ip.IPv4: Creating...
fly_machine.exampleMachine["ams"]: Creating...
fly_machine.exampleMachine["fra"]: Creating...
fly_ip.IPv6: Creation complete after 1s [id=ip_lm6k9x4v8kz1qp7r]
fly_machine.exampleMachine["ams"]: Creation complete after 5s [id=6e82929b37d987]
fly_ip.IPv4: Creation complete after 5s [id=ip_degn1rkq86d935om]
fly_machine.exampleMachine["fra"]: Creation complete after 6s [id=4d89044b492287]

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

IPv4_address = "137.66.14.187"
IPv6_address = "2a09:8280:1::1:7663"
DAlperin commented 2 years ago

I literally can't imagine what the terraform cli could possibly be breaking, sigh. Thanks for finding that out. Do you mind opening a new issue for that?

In the meantime let me look into it and see what I find out.

hb9cwp commented 2 years ago

Today, I tried both versions of Terraform CLi again. Actually, with both of them terraform apply fails maybe in one out of three or four attempts. Creation of the app, IPv4 and IPv6 addresses always succeeds, sometimes creation of one of the two machines fails, sometimes both of them, and then Terraform aborts with an error message. Sometimes, the apply stalls while trying to create the machines, until both machines are created after retrying for about 2 minutes:

$ terraform version
Terraform v1.3.0
on linux_amd64
+ provider registry.terraform.io/fly-apps/fly v0.0.18
$ terraform apply
...
Plan: 5 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + IPv4_address = (known after apply)
  + IPv4_region  = (known after apply)
  + IPv6_address = (known after apply)
  + IPv6_region  = (known after apply)

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

fly_app.flyMachines: Creating...
fly_app.flyMachines: Creation complete after 0s [id=fly-machine-hello02]
fly_ip.IPv6: Creating...
fly_ip.IPv4: Creating...
fly_machine.exampleMachine["fra"]: Creating...
fly_machine.exampleMachine["ams"]: Creating...
fly_ip.IPv6: Creation complete after 1s [id=ip_qr702pxnpjy136zd]
fly_ip.IPv4: Creation complete after 3s [id=ip_3jwv94n38531p506]
fly_machine.exampleMachine["fra"]: Still creating... [10s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [10s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [20s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [20s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [30s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [30s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [40s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [40s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [50s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [50s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [1m0s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [1m0s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [1m10s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [1m10s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [1m20s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [1m20s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [1m30s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [1m30s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [1m40s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [1m40s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [1m50s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [1m50s elapsed]
fly_machine.exampleMachine["ams"]: Still creating... [2m0s elapsed]
fly_machine.exampleMachine["fra"]: Still creating... [2m0s elapsed]
fly_machine.exampleMachine["fra"]: Creation complete after 2m1s [id=9080542a1e0787]
fly_machine.exampleMachine["ams"]: Creation complete after 2m1s [id=1781900c57d489]

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

IPv4_address = "168.220.90.171"
IPv6_address = "2a09:8280:1::3:785f"
$

Re-applying a partially failed apply manages to create the missing machine(s), most of the time in a second or third attempt. Fortunately, terraform destroy always manages to remove the resources cleanly, no matter in which state they were.

Now, I will try with another image in order to exclude that this behavior is actually related to the image I was testing with so far.

DAlperin commented 2 years ago

@hb9cwp can you drop your config so I can try out a modified version?

hb9cwp commented 2 years ago

@DAlperin I have just sent you my build steps with the sources via PM. Thank you.

hb9cwp commented 2 years ago

Terraform Cloud appears to choke once it tries to open a WG tunnel, although I have added the env variable FLY_API_TOKEN via the workspace of TF Cloud:

image

Here is the detailed output from Download raw log:

Terraform v1.3.0
on linux_amd64
Initializing plugins and modules...
{"@level":"info","@message":"Terraform 1.3.0","@module":"terraform.ui","@timestamp":"2022-09-25T12:54:21.841161Z","terraform":"1.3.0","type":"version","ui":"1.0"}
{"@level":"error","@message":"Error: failed to open internal tunnel","@module":"terraform.ui","@timestamp":"2022-09-25T12:54:23.111115Z","diagnostic":{"severity":"error","summary":"failed to open internal tunnel","detail":"tunnel error (doConnect): tunnel error (wgDev.Up()): permission denied","address":"provider[\"registry.terraform.io/fly-apps/fly\"]","range":{"filename":"flyGoHTTPS.tf","start":{"line":34,"column":16,"byte":835},"end":{"line":34,"column":17,"byte":836}},"snippet":{"context":"provider \"fly\"","code":"provider \"fly\" {","start_line":34,"highlight_start_offset":15,"highlight_end_offset":16,"values":[]}},"type":"diagnostic"}
Operation failed: failed running terraform plan (exit 1)
DAlperin commented 2 years ago

@hb9cwp thanks for all the materials. I'll take a look. As for the permission denied error in TF Cloud I've reached out to some folks at hashicorp but we have a running theory about why their environment isn't playing along. Stand by, we'll see what we can do.

jbarnette commented 2 years ago

@DAlperin any word from the HashiCorp folks?

mattste commented 2 years ago

I'm also running into the permission denied issue when running remotely in Terraform Cloud.

DAlperin commented 2 years ago

@mattste @jbarnette @hb9cwp I have a present for you. The machines API is now public, meaning it does not require any wireguard. If you set the env val FLY_HTTP_ENDPOINT or the provider setting fly_http_endpoint to https://api.machines.dev it should work :)

hb9cwp commented 2 years ago

@DAlperin It looks like Xmas is early this year! Do you still need to do other changes before cutting a new release? Latest v0.0.20 complains about : Error: fly wireguard tunnel must be open ... can't connect to the api, is the tunnel open? :)

Actually, it's just resource "fly_machine" that complains, while resource "fly_app" and resource "fly_ip" are fine! :-)

hb9cwp commented 2 years ago

@DAlperin I can confirm that your provider now works with remote Terraform using a workspace on HashiCorp Cloud Platform (HCP) as well (as long as I don't try to use resource fly_machine)! Thank you very much.

mattste commented 2 years ago

I decided to go ahead and publish a fork of this provider that removes the Wireguard tunnel code until @DAlperin can resolve this issue as they see fit. You can access the fork here.

To use the provider, configure it like this (if you're using cdktf):

new FlyProvider(this, "fly", {
    flyApiToken: "your-api-token-value",
    flyHttpEndpoint: "https://api.machines.dev"
})
DAlperin commented 2 years ago

@mattste thanks for this. I'll get a new version released today or tomorrow

DAlperin commented 2 years ago

@mattste @hb9cwp and co I made a silly mistake. The provider automatically prepends http(s) so just set FLY_HTTP_ENDPOINT or the provider setting fly_http_endpoint to api.machines.dev without https. I just tested it and it works.

mattste commented 2 years ago

@mattste @hb9cwp and co I made a silly mistake. The provider automatically prepends http(s) so just set FLY_HTTP_ENDPOINT or the provider setting fly_http_endpoint to api.machines.dev without https. I just tested it and it works.

Thanks for looking into this! I can confirm changing the endpoint to just the hostname fixes it.

I did not look carefully enough at this code to see that it's just a basic connection test not specific to WG.

Also, is there a security concern with using http as the code does now?

DAlperin commented 2 years ago

I'll check to be sure but I'm fairly sure it gets upgraded to https automatically

mattste commented 2 years ago

I created a post on the Hashicorp forums asking that my provider be unpublished to avoid potential confusion with the official provider. If you happen to have a support contact at Hashicorp then feel free to forward the request.

DAlperin commented 2 years ago

I think we can close this :)