-
## Environment
Prod (eu-central-1)
## Steps to reproduce
Unknown
## Expected result
The node metrics reported by the scheduler should always match its internal state.
## Actual resul…
-
## Steps to reproduce
Have a running compute, restart pageserver
## Expected result
No errors
## Actual result
`ERROR XX000 (internal_error) [NEON_SMGR] failed to flush page requests:`
…
-
It's bad if we log a lot of these, might be the reason staging got ratelimited with logging and vmauth.
Instead of fixing this with yet another quick-glue we should just do the rewrite in #5733.
…
-
Let's add a build with sanitizers (asan, ubsan) to the CI pipeline and run tests on it. To not blow up the CI workflow time, let's focus on Postgres 16 first.
Things to keep in mind:
- Enable sani…
-
Rough roll-out plan:
1. Switch `--enable-offload` in staging regions, observe for ~week
2. Switch `--enable-offload` in prod regions one by one, observe for ~week
3. Switch `--delete-offloaded-wal`…
-
Context: this message and subsequent ones in the thread https://neondb.slack.com/archives/C06K38EB05D/p1718188056338629?thread_ts=1718184799.253779&cid=C06K38EB05D
# Problem
During Pageserver sh…
-
Stuck /location_config operation while transitioning from attached to secondary (i.e. Tenant::shutdown), clearly a bug in pageserver. We mitigated by restarting pageserver, but, we should debug this.
…
-
This is an umbrella ticket for all the places may not be properly respecting cancellation in long running tasks
```[tasklist]
### Tasks
- [ ] Drop out of waiting for semaphore for remote storage
- […
jcsp updated
2 months ago
-
If a getpage@lsn request goes long as in takes more than X seconds, we should log a warning with a description of why it went for so long (where time was spent). Breakdown could be high level as total…
-
# Problem
We run `initdb` without specifying `--username`:
https://github.com/neondatabase/neon/blob/e823b9294714d0c5048942907c06b678c4a6c4a0/control_plane/src/storage_controller.rs#L244-L254
…