-
Let's add a build with sanitizers (asan, ubsan) to the CI pipeline and run tests on it. To not blow up the CI workflow time, let's focus on Postgres 16 first.
Things to keep in mind:
- Enable sani…
-
# Problem
Some tests leave stay processes behind after they exit.
This is the potential root cause for failed coverage-report generation, as well as other flakiness.
# DoD
The Python test …
-
# Summary
Original issue we hit was
```
page server returned error: tried to request a page version that was garbage collected. requested at C/1E923DE0 gc cutoff C/23B3DF00
```
but then the scope gre…
-
Long test runs:
- https://github.com/neondatabase/neon/actions/runs/4283157311/jobs/7458489003
- https://github.com/neondatabase/neon/actions/runs/4353333163/jobs/7607334995
Meta slack thread: ht…
-
Detected while doing manual deploys on #3636, we updated pageserver to invalid configuration which of course disabled pageserver on staging. Because we convert from yaml to toml, these kinds of surpri…
-
This is an umbrella ticket for all the places may not be properly respecting cancellation in long running tasks
```[tasklist]
### Tasks
- [ ] Drop out of waiting for semaphore for remote storage
- […
jcsp updated
5 months ago
-
## Motivation
According to anecdotal evidence, production pageservers are missing the 10s shutdown timeout during pageserver restarted, causing systemd to SIGKILL the pageserver processes.
This …
-
If a getpage@lsn request goes long as in takes more than X seconds, we should log a warning with a description of why it went for so long (where time was spent). Breakdown could be high level as total…
-
Rough roll-out plan:
1. Switch `--enable-offload` in staging regions, observe for ~week
2. Switch `--enable-offload` in prod regions one by one, observe for ~week
3. Switch `--delete-offloaded-wal`…
-
A problem uncovered by starting to do graceful shutdowns (#8655) in tests and benches, the symptom looks like "infinite layer flushes" even after the test has ended.
Most likely this is fallout from …