digitalocean / terraform-provider-digitalocean

Terraform DigitalOcean provider
https://registry.terraform.io/providers/digitalocean/digitalocean/latest/docs
Mozilla Public License 2.0
501 stars 268 forks source link

DigitalOcean API errors handling #794

Open wojtekregis opened 2 years ago

wojtekregis commented 2 years ago

Is your feature request related to a problem? Please describe.

In recent weeks, api.digitalocean.com is rather unstable, often showing Cloudflare's HTML document with HTTP 504 code. The day before yesterday is was out of service for several hours.

I have opened multiple tickets only to be thanked for patience and understanding time and again, and asked to provide more logs despite the problem being reproducible from DigitalOcean's own virtual machines by sending requests to DigitalOcean's API which is proxied by Cloudflare. I have no reason not to believe DO's support (Team Lead) blaming Cloudflare for this problem but it's been weeks since this statement and the errors are still very much present if not more frequent. The Terraform provider does not handle HTML pages thrown by api.digitalocean.com well and in most cases such errors result in broken state requiring manual labor. In extreme cases, state using "local" backend was completely gone from ext4 fs.

Describe the solution you'd like

The provider should be capable of handling API errors or HTML responses in such way that Terraform state stays consistent with already deployed resources.

Describe alternatives you've considered

I have opened multiple tickets with DigitalOcean regarding API instability and waited close to 4 weeks for a solution.

Additional context

When Cloudflare is able to connect to DigitalOcean and API responds with HTTP 500, state is saved.

2022-02-07T11:04:59.525Z [INFO]  provider.terraform-provider-digitalocean_v2.16.0: 2022/02/07 11:04:59 [DEBUG] DigitalOcean API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 500 Internal Server Error
Content-Length: 59
Cf-Cache-Status: DYNAMIC
Cf-Ray: xxx-yyy
Content-Type: application/json
Date: Mon, 07 Feb 2022 11:04:59 GMT
Expect-Ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4412
Ratelimit-Reset: xxx
Server: cloudflare
Set-Cookie: __cf_bm=xxx
X-Gateway: Edge-Gateway
X-Request-Id: xxx
X-Response-From: service

{
 "id": "Internal Server Error",
 "message": "Server Error"
}
gammons commented 1 year ago

hi @wojtekregis - apologies, I just recently saw this issue. Do you know if this is still an issue? Is it with a specific request that we could try on our end to replicate? 🙇

cnunciato commented 1 year ago

FWIW, I've hit it 5xx errors multiple times today trying to create DB clusters:

  digitalocean:index:DatabaseCluster (cluster):
    error: 1 error occurred:
        * Error creating database cluster: POST https://api.digitalocean.com/v2/databases: 504 <!DOCTYPE html>
    <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
    <!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
    <!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
    <!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
    <head>
    ...
keithgg commented 1 year ago

Adding a :+1: to this issue. It's been worse recently.

Usually, you're able to bypass the issue after waiting 10 minutes or so, but nowadays the issue is not only more persistent, but lasts longer as well.

I experience the issue mostly with the Databases API (MongoDB in particular).

dscain commented 1 year ago

I am seeing the same issue when trying to deploy a new App since yesterday. Is it possibly related to the following? https://github.com/digitalocean/terraform-provider-digitalocean/issues/808

The state also gets updated normally.

`-----------------------------------------------------: timestamp=2023-02-01T12:51:28.254Z 2023-02-01T12:51:28.980Z [INFO] provider.terraform-provider-digitalocean_v2.26.0: 2023/02/01 12:51:28 [WARN] Invalid log level: "1". Defaulting to level: TRACE. Valid levels are: [TRACE DEBUG INFO WARN ERROR]: timestamp=2023-02-01T12:51:28.980Z 2023-02-01T12:51:28.981Z [INFO] provider.terraform-provider-digitalocean_v2.26.0: 2023/02/01 12:51:28 [DEBUG] DigitalOcean API Response Details: ---[ RESPONSE ]-------------------------------------- HTTP/2.0 500 Internal Server Error Content-Length: 59 Cf-Cache-Status: DYNAMIC Cf-Ray: xxx Content-Type: application/json Date: Wed, 01 Feb 2023 12:51:28 GMT Ratelimit-Limit: 5000 Ratelimit-Remaining: 4990 Ratelimit-Reset: xxx Server: cloudflare Set-Cookie: xxx X-Gateway: Edge-Gateway X-Request-Id: xxx X-Response-From: service

{ "id": "Internal Server Error", "message": "Server Error" }`

dscain commented 1 year ago

So I just found that in my case the issue was the I was trying to deploy an App for which the spec had a "service" where the "repository" of the "image" property was set to one for which an image was not existent. In this case, after fixing this in the .tf file, it was possible to deploy without 500. I will log this issue in a separate ticket. Thanks!