AzBuilder / terrakube

Open source IaC Automation and Collaboration Software.
https://docs.terrakube.io
Apache License 2.0
492 stars 38 forks source link

Error when executing terraform with remote execution mode #1143

Closed seboudry closed 1 month ago

seboudry commented 1 month ago

Bug description 🐞

Hi!

With embedded minio as state storage backend when I try to run a terraform plan from my computer I get a 500 error from Terrakube API.

Here's the API component logs:

INFO 1 --- [nio-8080-exec-5] o.t.api.plugin.state.RemoteTfeService : EntitlementData(data=Resource(type=entitlement-sets, id=org-qpc-config-terraform))
INFO 1 --- [nio-8080-exec-1] o.t.a.plugin.state.RemoteTfeController : Searching: qpc-config-terraform qpc-config-terraform-qpc-dev-minio-apps
INFO 1 --- [nio-8080-exec-1] o.t.api.plugin.state.RemoteTfeService : Found Workspace Id: 23988862-b69e-4896-ae39-cd769a55d45b Terraform: 1.9.2
INFO 1 --- [nio-8080-exec-7] o.t.a.plugin.state.RemoteTfeController : Creating Configuration Version for worspaceId 23988862-b69e-4896-ae39-cd769a55d45b
INFO 1 --- [nio-8080-exec-7] o.t.api.plugin.state.RemoteTfeService : Create Configuration Version ConfigurationData(data=ConfigurationModel(attributes={auto-queue-runs=false, speculative=true}))
INFO 1 --- [nio-8080-exec-7] o.t.api.plugin.state.RemoteTfeService : Speculative true
INFO 1 --- [nio-8080-exec-7] o.t.api.plugin.state.RemoteTfeService : Auto Queue Runs false
INFO 1 --- [nio-8080-exec-7] o.t.api.plugin.state.RemoteTfeService : New content with id bcc082ee-3ce9-42bb-b4ef-5e38764b65d9 saved
INFO 1 --- [nio-8080-exec-7] o.t.api.plugin.state.RemoteTfeService : upload-url https://terrakube-api.mycompany.com/remote/tfe/v2/configuration-versions/bcc082ee-3ce9-42bb-b4ef-5e38764b65d9
INFO 1 --- [nio-8080-exec-6] o.t.a.plugin.state.RemoteTfeController : Uploading Id bcc082ee-3ce9-42bb-b4ef-5e38764b65d9 file
INFO 1 --- [nio-8080-exec-6] o.t.a.p.s.aws.AwsStorageTypeServiceImpl : context file: content/bcc082ee-3ce9-42bb-b4ef-5e38764b65d9/terraformContent.tar.gz
WARN 1 --- [nio-8080-exec-6] c.amazonaws.services.s3.AmazonS3Client : No content length specified for stream data. Stream contents will be buffered in memory and could result in out of memory errors.
ERROR 1 --- [nio-8080-exec-6] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: com.amazonaws.ResetException: Failed to reset the input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)] with root cause
java.io.IOException: Resetting to invalid mark
    at java.base/java.io.BufferedInputStream.reset(Unknown Source) ~[na:na]
    at com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.services.s3.internal.AWSS3V4Signer.getContentLength(AWSS3V4Signer.java:194) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.services.s3.internal.AWSS3V4Signer.calculateContentHash(AWSS3V4Signer.java:103) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:241) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1320) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) ~[aws-java-sdk-core-1.12.761.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5558) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5505) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:423) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6639) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1892) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1852) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1784) ~[aws-java-sdk-s3-1.12.761.jar:na]
    at org.terrakube.api.plugin.storage.aws.AwsStorageTypeServiceImpl.createContentFile(AwsStorageTypeServiceImpl.java:151) ~[classes/:2.22.0]
    at org.terrakube.api.plugin.state.RemoteTfeService.uploadFile(RemoteTfeService.java:699) ~[classes/:2.22.0]
    at org.terrakube.api.plugin.state.RemoteTfeController.uploadConfiguration(RemoteTfeController.java:191) ~[classes/:2.22.0]
  [...]

Steps to reproduce

@me:/workdir/minio/minio-apps$ terraform plan
Running plan in the remote backend. Output will stream here. Pressing Ctrl-C
will stop streaming the logs, but will not stop the plan running remotely.

Preparing the remote plan...

The remote workspace is configured to work with configuration at
/minio/minio-apps relative to the target repository.

Terraform will upload the contents of the following directory,
excluding files or directories as defined by a .terraformignore file
at /workdir/.terraformignore (if it is present),
in order to capture the filesystem context the remote workspace expects:
    /workdir

There was an error connecting to the remote backend. Please do not exit
Terraform to prevent data loss! Trying to restore the connection...

Still trying to restore the connection... (3s elapsed)
Still trying to restore the connection... (5s elapsed)

Expected behavior

No response

Example repository

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: terrakube
  namespace: terrakube
spec:
  releaseName: terrakube
  chart:
    spec:
      chart: terrakube
      sourceRef:
        kind: HelmRepository
        name: terrakube
      version: 3.17.6
  interval: 1h
  values:
    storage:
      defaultStorage: true
    api:
      defaultDatabase: true
      defaultRedis: true
      loadSampleData: false
      properties:
        databaseType: POSTGRESQL
      version: 2.22.0
    ui:
      version: 2.22.0
    registry:
      version: 2.22.0
    executor:
      version: 2.22.0
    dex:
      config:
        connectors:
          - type: oidc
            id: keycloak-sso
            name: MyCompany SSO
            config:
              issuer: https://sso.mycompany.com/auth/realms/master
              clientID: terrakube
              clientSecret: xxxxxxxxxxxxxxxxxxxxxxxxxx
              redirectURI: https://terrakube-api.mycompany.com/dex/callback
              insecureEnableGroups: true
        issuer: https://terrakube-api.mycompany.com/dex
        oauth2:
          skipApprovalScreen: true
        staticClients:
          - id: terrakube-ui
            redirectURIs:
              - https://terrakube-ui.mycompany.com
              - /device/callback
              # for local usage
              - http://localhost:10000/login
            name: Terrakube
            public: true
      volumes: []
      volumeMount: []
    security:
      adminGroup: mycompany-production
      dexClientId: terrakube-ui
      dexIssuerUri: https://terrakube-api.mycompany.com/dex
      useOpenLDAP: false
    ingress:
      useTls: true
      ui:
        domain: terrakube-ui.mycompany.com
        tlsSecretName: terrakube-ui-certs
        annotations:
          cert-manager.io/cluster-issuer: letsencrypt-prod
          kubernetes.io/tls-acme: "true"
          ingress.kubernetes.io/ssl-redirect: "true"
      api:
        domain: terrakube-api.mycompany.com
        tlsSecretName: terrakube-api-certs
        annotations:
          cert-manager.io/cluster-issuer: letsencrypt-prod
          kubernetes.io/tls-acme: "true"
          ingress.kubernetes.io/ssl-redirect: "true"
          nginx.ingress.kubernetes.io/client-body-buffer-size: 100M
      registry:
        # enabled: false
        domain: terrakube-registry.mycompany.com
        tlsSecretName: terrakube-registry-certs
        annotations:
          cert-manager.io/cluster-issuer: letsencrypt-prod
          kubernetes.io/tls-acme: "true"
          ingress.kubernetes.io/ssl-redirect: "true"

Anything else?

I also defined client-body-buffer-size (see Helm values) to be shure Nginx ingress controller doesn't buffer anything.

But I don't know where is the root cause between my laptop, GCP load balancer, Nginx ingress contoller, Terrakube API, MinIO storage.

alfespa17 commented 1 month ago

This is the first time that I see that error, maybe you can try the following:

https://github.com/AzBuilder/terrakube/issues/496

The above is related to some nginx configuration when you are uploading some big files

I think it is failing in this part of the code when it is trying to upload the terraform code file and create the file in MINIO

https://github.com/AzBuilder/terrakube/blob/d3c4ceca98d316a99c2c40996116edb508fdd9ea/api/src/main/java/org/terrakube/api/plugin/storage/aws/AwsStorageTypeServiceImpl.java#L151

alfespa17 commented 1 month ago

According to this maybe adding the following to your helm chart values could work for you.

api:
  env:
  - name: JAVA_TOOL_OPTIONS
    value: "-Dcom.amazonaws.sdk.s3.defaultStreamBufferSize=YOUR_MAX_PUT_SIZE"
seboudry commented 1 month ago

I tried all of these without success...

    api:
      env:
        - name: JAVA_TOOL_OPTIONS
          value: >
            -Dcom.amazonaws.sdk.s3.defaultStreamBufferSize=100MB
            -Dspring.servlet.multipart.max-file-size=100MB
            -Dspring.servlet.multipart.max-request-size=100MB
    ingress:
      api:
        annotations:
          nginx.ingress.kubernetes.io/client-body-buffer-size: 100M
          nginx.ingress.kubernetes.io/proxy-body-size: 100M
alfespa17 commented 1 month ago

I tried all of these without success...

    api:
      env:
        - name: JAVA_TOOL_OPTIONS
          value: >
            -Dcom.amazonaws.sdk.s3.defaultStreamBufferSize=100MB
            -Dspring.servlet.multipart.max-file-size=100MB
            -Dspring.servlet.multipart.max-request-size=100MB
    ingress:
      api:
        annotations:
          nginx.ingress.kubernetes.io/client-body-buffer-size: 100M
          nginx.ingress.kubernetes.io/proxy-body-size: 100M

Quick question, what is the size of the folder where you are running the terraform cli?

seboudry commented 1 month ago

Nice question!

I figured out that the tf.state used to migrate to terrakube was still on the directory...

After ignoring it on .terraformignore, it's working!

I will test separately the different options to know wich ones are required.

seboudry commented 1 month ago

OK, so no configuration is needed...

Sorry to distrub @alfespa17 and thanks to help diagnose the pain point.

But I don't know a method to get the actual size fo the sent archive...