Closed yorinasub17 closed 1 year ago
It appears that this might be the same issue as https://github.com/denoland/deploy_feedback/issues/517, in that the KV is just not available for the first X minutes of the service. I'm observing that the problem magically goes away after waiting 5-8 minutes after a deploy.
@yorinasub17 fix for #517 rolled out earlier today. are you still seeing this issue?
Apologies for the delay. Yes that appears to have addressed this issue! I haven't run into this today across multiple deployments. Closing as solved!
@igorzi Reopening as it looks like this issue is back. This was ok for a while, but I have run into this a few times today now. I am again observing that if the service is hit soon after deployment, queues are not available because it isn't considered production.
@yorinasub17 do you only see this with deployments, which were first preview and then promoted to production?
Have you seen this issue on deployments, which were created as production deployments? (e.g. from main branch)
I'm actually using the GitHub Actions based deployment, and only to deploy from the main
branch, which my understanding is that it is a straight production deployment (no preview environments).
As a side note, it almost always resolves when I do a re-run on the GitHub Actions job to trigger a redeployment (in fact, this is the primary reason I'm doing it this way so I can trigger redeployments with ease).
Here's a snippet of the workflow file:
deployprod:
runs-on: ubuntu-latest
permissions:
id-token: write # Needed to auth to Deno Deploy
checks: write # Needed for GHA to write the checks for the job
contents: read
steps:
- uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.0.0
with:
fetch-depth: 0
- uses: denoland/deployctl@b841621a76eae438b09e1bce5e74549678c24e7f # v1.8.2
with:
project: fensak
entrypoint: main.ts
Just to double confirm, you never use "Promote to production" button in your workflow?
Ah nope I've never used that feature. I pretty much ignore preview environments since I can't use them (my app is a GitHub App, so switching webhook endpoints is a pain).
If it helps with the debugging, my workflows are open source at https://github.com/fensak-io/fensak. I saw this most recently on the fensak-stage-xaweadnqvqhj.deno.dev
environment build in the fensak-stage
project.
It turns out that this warning is benign in this case. The message does get enqueued despite the warning.
We will work on fixing so that the warning is not displayed in this case.
Oh hmm that's not exactly what I'm observing. I have a health check endpoint that enqueues a message and then waits for a kv object to populate, which indicates the worker processed it. When I see the message pop up, this endpoint is failing.
That said, if you are seeing evidence of the message queueing, then it must be something with what I'm doing that's the problem.
Let me see if I can create a minimal reproducible example to see if I can repro in a smaller app.
Thanks for the help so far by the way!
Ok I did some more digging, and I think you are right that the messages are getting queued. It was just that there is a delay (about 2-3 minutes) before the worker starts processing the messages, which was why the health check fails (it only tries for 30 seconds). So this was indeed, user error from a misunderstanding on my part.
Will close this and the other related issue I opened as well. Again, thanks for helping and digging into this!
@yorinasub17 - Sorry, I take that back. After some more investigation, it's possible that the message is dropped when this condition hits. We are working on a fix.
@yorinasub17 - a fix was rolled out last week. Have you observed this in the last few days?
Nope it's been working fine! I haven't run into any issues on my deployments last week. I think this issue can be closed.
Thanks!
🔍
Type of feedback
Bug report
Description
I have gotten into a state where a deployed app is promoted to production, but I get an error message that Queues is not available on a preview environment.
Steps to reproduce (if applicable)
I don't have concrete steps for reproducing this since I haven't had the time to explore various scenarios. However, I have a sneaking suspicion that there is some kind of race condition between the swapping of environments to the new version, and existing requests.
The reason I think that is because this only started happening once I introduced a release testing workflow, where an automated test script waits for the deployment to finish and as soon as it detects Deno Deploy finishes it's routine, it starts hitting it with requests to perform a smoke test. Some of those initial requests pass (which suggest it is hitting the old environment), before they start to fail after a few seconds (which suggest they are hitting the new environment).
I did not see this happen at all when I was testing manually after releasing, which is why it makes me think that this has something to do with a live request being handled by the old environment at the boundary.
Expected behavior (if applicable)
Queues should be available on production environments.
Possible solution (if applicable)
No response
Additional context
Logs from live prod environment: