denoland / deploy_feedback

For reporting issues with Deno Deploy
https://deno.com/deploy
74 stars 5 forks source link

[KV Feedback]: "Queues not available on preview environment" error in production environment #516

Closed yorinasub17 closed 1 year ago

yorinasub17 commented 1 year ago

🔍

Type of feedback

Bug report

Description

I have gotten into a state where a deployed app is promoted to production, but I get an error message that Queues is not available on a preview environment.

Steps to reproduce (if applicable)

I don't have concrete steps for reproducing this since I haven't had the time to explore various scenarios. However, I have a sneaking suspicion that there is some kind of race condition between the swapping of environments to the new version, and existing requests.

The reason I think that is because this only started happening once I introduced a release testing workflow, where an automated test script waits for the deployment to finish and as soon as it detects Deno Deploy finishes it's routine, it starts hitting it with requests to perform a smoke test. Some of those initial requests pass (which suggest it is hitting the old environment), before they start to fail after a few seconds (which suggest they are hitting the new environment).

I did not see this happen at all when I was testing manually after releasing, which is why it makes me think that this has something to do with a live request being handled by the old environment at the boundary.

Expected behavior (if applicable)

Queues should be available on production environments.

Possible solution (if applicable)

No response

Additional context

Logs from live prod environment:

Screenshot_2023-10-02_at_4_06_26 PM
yorinasub17 commented 1 year ago

It appears that this might be the same issue as https://github.com/denoland/deploy_feedback/issues/517, in that the KV is just not available for the first X minutes of the service. I'm observing that the problem magically goes away after waiting 5-8 minutes after a deploy.

igorzi commented 1 year ago

@yorinasub17 fix for #517 rolled out earlier today. are you still seeing this issue?

yorinasub17 commented 1 year ago

Apologies for the delay. Yes that appears to have addressed this issue! I haven't run into this today across multiple deployments. Closing as solved!

yorinasub17 commented 1 year ago

@igorzi Reopening as it looks like this issue is back. This was ok for a while, but I have run into this a few times today now. I am again observing that if the service is hit soon after deployment, queues are not available because it isn't considered production.

fensak-stage_-_Project_-_Deploy
igorzi commented 1 year ago

@yorinasub17 do you only see this with deployments, which were first preview and then promoted to production?

Have you seen this issue on deployments, which were created as production deployments? (e.g. from main branch)

yorinasub17 commented 1 year ago

I'm actually using the GitHub Actions based deployment, and only to deploy from the main branch, which my understanding is that it is a straight production deployment (no preview environments).

As a side note, it almost always resolves when I do a re-run on the GitHub Actions job to trigger a redeployment (in fact, this is the primary reason I'm doing it this way so I can trigger redeployments with ease).

Here's a snippet of the workflow file:

  deployprod:
    runs-on: ubuntu-latest
    permissions:
      id-token: write # Needed to auth to Deno Deploy
      checks: write # Needed for GHA to write the checks for the job
      contents: read
    steps:
      - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.0.0
        with:
          fetch-depth: 0

      - uses: denoland/deployctl@b841621a76eae438b09e1bce5e74549678c24e7f # v1.8.2
        with:
          project: fensak
          entrypoint: main.ts
igorzi commented 1 year ago

Just to double confirm, you never use "Promote to production" button in your workflow?

yorinasub17 commented 1 year ago

Ah nope I've never used that feature. I pretty much ignore preview environments since I can't use them (my app is a GitHub App, so switching webhook endpoints is a pain).

yorinasub17 commented 1 year ago

If it helps with the debugging, my workflows are open source at https://github.com/fensak-io/fensak. I saw this most recently on the fensak-stage-xaweadnqvqhj.deno.dev environment build in the fensak-stage project.

igorzi commented 1 year ago

It turns out that this warning is benign in this case. The message does get enqueued despite the warning.

We will work on fixing so that the warning is not displayed in this case.

yorinasub17 commented 1 year ago

Oh hmm that's not exactly what I'm observing. I have a health check endpoint that enqueues a message and then waits for a kv object to populate, which indicates the worker processed it. When I see the message pop up, this endpoint is failing.

That said, if you are seeing evidence of the message queueing, then it must be something with what I'm doing that's the problem.

Let me see if I can create a minimal reproducible example to see if I can repro in a smaller app.

Thanks for the help so far by the way!

yorinasub17 commented 1 year ago

Ok I did some more digging, and I think you are right that the messages are getting queued. It was just that there is a delay (about 2-3 minutes) before the worker starts processing the messages, which was why the health check fails (it only tries for 30 seconds). So this was indeed, user error from a misunderstanding on my part.

Will close this and the other related issue I opened as well. Again, thanks for helping and digging into this!

igorzi commented 1 year ago

@yorinasub17 - Sorry, I take that back. After some more investigation, it's possible that the message is dropped when this condition hits. We are working on a fix.

igorzi commented 1 year ago

@yorinasub17 - a fix was rolled out last week. Have you observed this in the last few days?

yorinasub17 commented 1 year ago

Nope it's been working fine! I haven't run into any issues on my deployments last week. I think this issue can be closed.

igorzi commented 1 year ago

Thanks!