fly-apps / terraform-provider-fly

Terraform provider for the Fly.io API
BSD 3-Clause "New" or "Revised" License
114 stars 38 forks source link

Fork #257

Open andrewbaxter opened 8 months ago

andrewbaxter commented 8 months ago

I made a fork to merge a bunch of fixes and make other quality of life changes: https://github.com/andrewbaxter/terraform-provider-fly

I've re-published it here: https://registry.terraform.io/providers/andrewbaxter/fly

Note that there may be bugs (and skipped version numbers as I get the build working). I'm happy to review, merge, and release, but I can't test all the features. I don't think this is such a dynamic target that stability will be lacking.

As for what's included atm:

I tweaked the code a bit unnecessarily and fixed a few issues I saw locally, deleted stuff I didn't want to maintain for now (makefiles, dockerfiles, examples, etc)

I cherry-picked/merged:

Affected issues:

andrewbaxter commented 8 months ago

Shared ipv4 address (enabled on app) is working now AFAICT

andrewbaxter commented 8 months ago

More fixes for ip addresses, certs, etc. I renamed a bunch of fields to make them more consistent (no units, underscores between words etc). I think I had probably already made breaking changes merging/updating all of the above, and better to break things now than later.

The fixes to the state file should be straightforward, but let me know if you have issues.

jesse-savary commented 8 months ago

@andrewbaxter Having some issues with the shared IP while creating a fly_app resource:

│ Error: Provider returned invalid result object after apply
│
│ After the apply operation, the provider still indicated an unknown value for fly_app.foobar.shared_ip_address. All values must be known after
│ apply, so this is always a bug in the provider and should be reported in the provider's own repository. OpenTofu will still save the other known object
│ values in the state.

Possibly an OpenTofu issue?

andrewbaxter commented 8 months ago

That sounds like a normal error... I assume this is a new resource? Let me see if I missed reading that from an api response somewhere - when you create a new resource all the fields are "unknown" I think, and you need to read them from the api response. Maybe opentofu is more strict in this regard.

The other possibility is if it's an existing resource maybe the old value is sharedipaddress and it's looking at shared_ip_address in the state file and doesn't find it, so it thinks it's unknown?

andrewbaxter commented 8 months ago

https://github.com/andrewbaxter/terraform-provider-fly/commit/b5e59caa1992b565125af53e434ff1d890243609 should do it I think? Tagged as 0.1.9, build will take maybe 10 minutes.

jesse-savary commented 8 months ago

andrewbaxter@b5e59ca should do it I think? Tagged as 0.1.9, build will take maybe 10 minutes.

Solves the shared IP issue but now I'm getting a 400 Bad Request on volume creation:

terraform {
  required_providers {
    fly = {
      source  = "registry.terraform.io/andrewbaxter/fly"
      version = "0.1.9"
    }
  }
}

provider "fly" {}

resource "fly_app" "redacted" {
  name = "redacted"
  org  = "..."
  assign_shared_ip_address = false
}

resource "fly_volume" "redacted-volume" {
  name   = "redacted-volume"
  app    = fly_app.redacted.name
  size   = 10
  region = "yyz"
}
│ Error: Failed to create volume
│
│   with fly_volume.redacted_volume,
│   on main.tf line 21, in resource "fly_volume" "redacted_volume":
│   21: resource "fly_volume" "redacted_volume" {
│
│ got error response from POST https://api.machines.dev/v1/apps/redacted/volumes: 400 Bad Request
│ body:
│
andrewbaxter commented 8 months ago

Great! And not great.

Uhh, I'm not sure. That looks like the right endpoint and I didn't touch the volume create part of the code, and it's working for me (just for the record).

  1. Can you try it with flyctl volume create and see if it works (and delete again after)? Maybe with additional logging... FWIW I had to modify it to log the requests since flyctl doesn't have any real useful verbose logging:

    diff --git a/flaps/flaps.go b/flaps/flaps.go
    index 94cccfaa..cd44fa3d 100644
    --- a/flaps/flaps.go
    +++ b/flaps/flaps.go
    @@ -7,6 +7,7 @@ import (
            "errors"
            "fmt"
            "io"
    +       "log"
            "net"
            "net/http"
            "net/url"
    @@ -178,6 +179,7 @@ func (f *Client) _sendRequest(ctx context.Context, method, endpoint string, in,
            }
            req.Header.Set("User-Agent", f.userAgent)
    
    +       log.Printf("DEBUG sendRequest %s %s\nHeaders: %#v\nReq body: %#v", method, endpoint, headers, in)
            resp, err := f.httpClient.Do(req)
            if err != nil {
                    return 0, err
  2. Can you try terraform again with TF_LOG=DEBUG and compare what it's POSTing? The TF_LOG=DEBUG output is verbose but I added full http logging (labeled HTTP REQUEST and HTTP RESPONSE).

If there's an issue with the request it should be apparent as some difference between the requests.

jesse-savary commented 8 months ago

After attempting to create the volume manually, the issue appears to be the volume name. flyctl won't accept dashes (e.g. redacted-volume) but will accept underscores (redacted_volume works). However, attempting to fix this in the terraform file doesn't work as there's some kind of validation regex that doesn't match up with Fly's requirements.

Also, while I was renaming the resources and re-applying the changes, I ran into another issue with volume deletion:

2023-12-13T13:23:55.237-0400 [WARN]  unexpected data: registry.terraform.io/andrewbaxter/fly:stdout="{\"error\":\"Could not find App \\"vol_zrekqgqpexw75x1r\\"\"}"
2023-12-13T13:23:55.237-0400 [ERROR] provider.terraform-provider-fly_v0.1.9: Response contains error diagnostic:
  diagnostic_detail=
  | got error response from DELETE https://api.machines.dev/v1/apps/vol_zrekqgqpexw75x1r/volumes/redacted: 404 Not Found
  | body:
   tf_provider_addr=registry.terraform.io/fly-apps/fly tf_req_id=73bebaa0-f7f5-9022-1b9e-68f132dc1894 tf_resource_type=fly_volume tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.19.1/tfprotov6/internal/diag/diagnostics.go:58 @module=sdk.proto diagnostic_severity=ERROR diagnostic_summary="Delete volume failed" tf_proto_version=6.4 timestamp=2023-12-13T13:23:55.237-0400
2023-12-13T13:23:55.243-0400 [DEBUG] State storage *statemgr.Filesystem declined to persist a state snapshot
2023-12-13T13:23:55.243-0400 [ERROR] vertex "fly_volume.redacted_volume (destroy)" error: Delete volume failed
╷
│ Warning: Debug mode enabled
│
│   with provider["registry.terraform.io/andrewbaxter/fly"],
│   on main.tf line 10, in provider "fly":
│   10: provider "fly" {
│
│ Debug mode enabled, this will add the Fly-Force-Trace header to all graphql requests
╵
╷
│ Error: Delete volume failed
│
│ got error response from DELETE https://api.machines.dev/v1/apps/vol_zrekqgqpexw75x1r/volumes/redacted: 404 Not Found
│ body:
│
╵
2023-12-13T13:23:55.250-0400 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-12-13T13:23:55.251-0400 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/andrewbaxter/fly/0.1.9/darwin_arm64/terraform-provider-fly_v0.1.9 pid=74632
2023-12-13T13:23:55.251-0400 [DEBUG] provider: plugin exited

Seems like the API call has a couple fields swapped around? (volume name/app name)

Edit: found another issue; providing an empty list of ports inside the services block causes a failure:

resource "fly_machine" "server" {
  ...
  services = [
    {
      ports                = []
      internal_port        = 9000
      protocol             = "tcp"
    }
  ]
}
│ Error: Provider produced inconsistent result after apply
│
│ When applying changes to fly_machine.woodpecker_server, provider "provider[\"registry.terraform.io/andrewbaxter/fly\"]" produced an unexpected new
│ value: .services[0].ports: was cty.ListValEmpty(cty.Object(map[string]cty.Type{"force_https":cty.Bool, "handlers":cty.List(cty.String),
│ "port":cty.Number})), but now null.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

Thank you for all your work on this fork @andrewbaxter, I was working on updating https://github.com/dirien/pulumi-fly and was worried I'd have to give up before I found this issue.

andrewbaxter commented 8 months ago

Fast! I'll reply to the ports thing in a second.

Hmm strange, #213 says the api itself doesn't accept underscores if I read it correctly. I'll remove the local validation, I think that's the best way to do it (at worst it just delays the error slightly, right?)

And you're right, it looks like the arguments were swapped. Fixed (or, will be in a second).

andrewbaxter commented 8 months ago

Oh no, I'm glad you're trying it out and helping me debug issues too! I can't test everything myself and I'm not sure if it'd have been worth it if I was the only one using it.

I'll take a look, the ports thing seems simple enough to fix.

jesse-savary commented 8 months ago

Fantastic, thank you. Any idea on how difficult it would be to add the auto_start, auto_stop, and min_machines_running to the services block? If it's non-trivial I'd be happy to contribute the changes myself.

andrewbaxter commented 8 months ago

I don't think you'd have a hard time of it, it should be pretty straight forward. The hard parts are looking up how that works in the api and testing it (I don't use those so I can't easily help with the latter). If you want to try it that would be awesome!

andrewbaxter commented 8 months ago

Let me just dump about the services thing quick.

I think the simple issue is

            Ports:        tfPorts,

This puts a normal go array in the terraform data, instead of a types.ListValue (terraform's data model list type). Terraform is super lenient, so it accepts this, but I guess it differentiates nil from []TfPort{} (an empty array) even though due to go design weirdness those are treated synonymously in most places. I think the simple solution is to initialize it to []TfPort{}.

But I think overall this should be changed to using the terraform wrappers everywhere... maybe? I haven't touched the machines code much so far, and there's a couple other things I've been trying to replace, like instead of doing wholesale replacement of the data when getting api responses just merging just the computed (upstream modifiable) values. There's places where you send the fly api a value and the api returns an empty string or whatever, and when it's replaced terraform freaks out.

Writing this out, I'll try the simple fix for now and maybe next time I have to touch the code I'll try doing a larger fix. I imagine there are probably other bugs with the machine code due to the above that will come out at some point....

andrewbaxter commented 8 months ago

Okay I pushed 0.1.10 with the two fixes (I hope).

Also I'd be happy to try adding those fields, if you don't mind doing some back and forth testing/bug fixing like we're doing now. But of course, if you want to make an PR that would be totally awesome.

(Edit: I'm off now)

andrewbaxter commented 8 months ago

Oh, I just looked at he pulimi provider! Does that import this then? Let me know if I can change something to make it more adaptable/importable or whatever.