DevelopingSpace / starchart

A self-serve tool for managing custom domains and certificates
MIT License
21 stars 13 forks source link

Unable to create certificate order on production #651

Closed humphd closed 1 year ago

humphd commented 1 year ago

I've fixed the issue with AWS on production, and now it looks like we have a new problem with Let's Encrypt:

{"level":"info","message":"Creating certificate order for davidhumphrey.mystudentproject.ca","timestamp":"2023-04-19T19:26:02.292Z"}
{"level":"debug","message":"Initializing Let's encrypt","timestamp":"2023-04-19T19:26:02.292Z"}
{"level":"debug","message":"Creating ACME order for davidhumphrey.mystudentproject.ca","timestamp":"2023-04-19T19:26:03.119Z"}
{"level":"info","message":"Order created successfully","timestamp":"2023-04-19T19:26:03.735Z"}
{"clientVersion":"4.13.0","code":"P2002","level":"error","message":"Failed to update certificate 9 in db \nInvalid `prisma.certificate.update()` invocation:\n\n\nUnique constraint failed on the constraint: `Certificate_orderUrl_key`","meta":{"target":"Certificate_orderUrl_key"},"stack":"Error: \nInvalid `prisma.certificate.update()` invocation:\n\n\nUnique constraint failed on the constraint: `Certificate_orderUrl_key`\n    at pn.handleRequestError (/app/node_modules/@prisma/client/runtime/library.js:176:6477)\n    at pn.handleAndLogRequestError (/app/node_modules/@prisma/client/runtime/library.js:176:5907)\n    at pn.request (/app/node_modules/@prisma/client/runtime/library.js:176:5786)\n    at t._request (/app/node_modules/@prisma/client/runtime/library.js:179:10484)\n    at Worker.import_bullmq.Worker.connection [as processFn] (/app/build/index.js:1965:7)\n    at Worker.processJob (/app/node_modules/bullmq/src/classes/worker.ts:667:22)\n    at Worker.retryIfFailed (/app/node_modules/bullmq/src/classes/worker.ts:858:16)","timestamp":"2023-04-19T19:26:03.747Z"}

I think we're failing here:

https://github.com/DevelopingSpace/starchart/blob/main/app/queues/certificate/order-creator-worker.server.ts#L118-L127

In the database, the orderUrl is NULL:

Screenshot 2023-04-19 at 3 34 05 PM

I've tried starting and stopping the containers a few times, and retried after the process fails. The same error happens every time.

cc @dadolhay.

humphd commented 1 year ago

I tried switching to use the Let's Encrypt Staging URL, and it didn't make a difference, same problem.

humphd commented 1 year ago

I tried enabling debugging in the node-acme client, and it looks like the Let's Encrypt part is working:

{"level":"info","message":"Creating certificate order for sfrunza.mystudentproject.ca","timestamp":"2023-04-19T22:01:04.294Z"}
{"level":"debug","message":"Initializing Let's encrypt","timestamp":"2023-04-19T22:01:04.295Z"}
19T22:01:04.298Z acme-client HTTP request: get https://acme-v02.api.letsencrypt.org/directory
19T22:01:04.497Z acme-client RESP 200 get https://acme-v02.api.letsencrypt.org/directory
19T22:01:04.497Z acme-client HTTP request: head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:04.618Z acme-client RESP 200 head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:04.620Z acme-client Using nonce: C878nb4O4dXzDJ1Re7W4Q6QijIxW0kxNDGvuVq0qtEwhkBY
19T22:01:04.624Z acme-client HTTP request: post https://acme-v02.api.letsencrypt.org/acme/new-acct
19T22:01:04.806Z acme-client RESP 200 post https://acme-v02.api.letsencrypt.org/acme/new-acct
19T22:01:04.807Z acme-client Account already exists (HTTP 200), returning updateAccount()
19T22:01:04.807Z acme-client HTTP request: head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:04.926Z acme-client RESP 200 head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:04.926Z acme-client Using nonce: C878w0-Y6wItr7hS8BGOp-zQQz0WkovPrZ1CueQARoaInfk
19T22:01:04.929Z acme-client HTTP request: post https://acme-v02.api.letsencrypt.org/acme/acct/1069544847
19T22:01:05.057Z acme-client RESP 200 post https://acme-v02.api.letsencrypt.org/acme/acct/1069544847
{"level":"debug","message":"Creating ACME order for sfrunza.mystudentproject.ca","timestamp":"2023-04-19T22:01:05.057Z"}
19T22:01:05.058Z acme-client HTTP request: head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:05.204Z acme-client RESP 200 head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:05.204Z acme-client Using nonce: C878R2NyLxCPLj99Q3_yTPRIBR8fUFZ0dqQlHAul2OulwaA
19T22:01:05.208Z acme-client HTTP request: post https://acme-v02.api.letsencrypt.org/acme/new-order
19T22:01:05.384Z acme-client RESP 201 post https://acme-v02.api.letsencrypt.org/acme/new-order
19T22:01:05.384Z acme-client HTTP request: head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:05.387Z acme-client HTTP request: head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:05.513Z acme-client RESP 200 head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:05.513Z acme-client Using nonce: 1DFAWA9rrwDHAvvVogQG2hCuxVaqf6GM_M-PDpvQeCuTZWU
19T22:01:05.516Z acme-client HTTP request: post https://acme-v02.api.letsencrypt.org/acme/authz-v3/220802596627
19T22:01:05.552Z acme-client RESP 200 head https://acme-v02.api.letsencrypt.org/acme/new-nonce
19T22:01:05.552Z acme-client Using nonce: F977rpXQOn-H799MNzG-9VEgdvpEAQAHSiCcwfk_nLb832E
19T22:01:05.554Z acme-client HTTP request: post https://acme-v02.api.letsencrypt.org/acme/authz-v3/220802596637
19T22:01:05.687Z acme-client RESP 200 post https://acme-v02.api.letsencrypt.org/acme/authz-v3/220802596627
19T22:01:05.708Z acme-client RESP 200 post https://acme-v02.api.letsencrypt.org/acme/authz-v3/220802596637
{"level":"info","message":"Order created successfully","timestamp":"2023-04-19T22:01:05.708Z"}
{"clientVersion":"4.13.0","code":"P2002","level":"error","message":"Failed to update certificate 13 in db \nInvalid `prisma.certificate.update()` invocation:\n\n\nUnique constraint failed on the constraint: `Certificate_orderUrl_key`","meta":{"target":"Certificate_orderUrl_key"},"stack":"Error: \nInvalid `prisma.certificate.update()` invocation:\n\n\nUnique constraint failed on the constraint: `Certificate_orderUrl_key`\n    at pn.handleRequestError (/app/node_modules/@prisma/client/runtime/library.js:176:6477)\n    at pn.handleAndLogRequestError (/app/node_modules/@prisma/client/runtime/library.js:176:5907)\n    at pn.request (/app/node_modules/@prisma/client/runtime/library.js:176:5786)\n    at t._request (/app/node_modules/@prisma/client/runtime/library.js:179:10484)\n    at Worker.import_bullmq.Worker.connection [as processFn] (/app/build/index.js:1965:7)\n    at Worker.processJob (/app/node_modules/bullmq/src/classes/worker.ts:667:22)\n    at Worker.retryIfFailed (/app/node_modules/bullmq/src/classes/worker.ts:858:16)","timestamp":"2023-04-19T22:01:05.821Z"}

It shows 201 post https://acme-v02.api.letsencrypt.org/acme/new-order, which is what I'd expect if the order went through, then it should pick the Location header off the response to give to us for the orderUrl. Maybe our issue is database related?

humphd commented 1 year ago

cc @cychu42, in case there's some db thing I'm missing.

ghost commented 1 year ago

@humphd Yes, it seems to be a db issue Unique constraint failed on the constraint: Certificate_orderUrl_key So the certificate orderUrl is not unique ... not sure how can that be

ghost commented 1 year ago

I think the following has happened:

ghost commented 1 year ago

PR added, you will also have to go in and manually null out the orderUrl of certificates that are in a failed state

humphd commented 1 year ago

Great detective work, @dadolhay! Thanks for jumping on this. I'll fix db now in preparation for landing this.