Open steve-chavez opened 2 years ago
nullableValue
Code path: nullableValue -> encodingBytes -> builderBytes -> unsafeCreate
unsafeCreate :: Int -> (Ptr Word8 -> IO ()) -> ByteString | Source A way of creating ByteStrings outside the IO monad. The Int argument gives the final size of the ByteString. Unlike createAndTrim the ByteString is not reallocated if the final size is less than the estimated size.
So clearly unsafeCreate
is doing the allocation here.
sendQueryPrepared
sendQueryPrepared -> useAsCString
useAsCString O(n) construction Use a ByteString with a function requiring a null-terminated CString. The CString is a copy and will be freed automatically; it must not be stored or used after the subcomputation finishes.
useAsCString
ditto.
It's a bit of a mess this one. I think LinearTypes would ensure this doesn't happen on libraries.
There are some options for making progress on this. The current best state is https://github.com/PostgREST/postgrest/pull/2349, which
jsonBytes
encoding (so we have non-zero-terminated binary encoding) and modify postgresql-libpq to use unsafeUseAsCStringjsonBytes
The postgresql-libpq change seems like a reasonable change to get merged upstream. Regarding hasql
, it's a bit less clear how to fit our needs into the API nicely -- currently exploring this.
For the postgresql-libpq change in particular we could move ahead and build against a patched fork since the API doesn't change.
Also let's note some options that go beyond these "easy" wins:
Does "streaming the body" mean the database connection is taken from the pool and blocked once a http request starts and is still ongoing?
I think this could, while improving memory usage, decrease performance overall, because it would keep the database connections blocked for longer, saturating the pool faster.
I think there is value in making sure to receive a full request first and then sending it to the database at once. And that includes making a single copy of the request body, I assume.
Does "streaming the body" mean the database connection is taken from the pool and blocked once a http request starts and is still ongoing?
Indeed. It's a good point, thanks, easy to get carried away when optimizing a single parameter :).
For the postgresql-libpq change in particular we could move ahead and build against a patched fork since the API doesn't change.
Sounds good, we can use the fork for now.
We might be able to read the body into a strict bytestring directly. If we get a content-length header, we should be able to allocate the body buffer directly. That could tie in with the changes above, and avoid the copy from lazy bytestring to bytestring. Using libpq (the C library) is what requires us to copy the full body to memory at all.
Just to clarify. The above means that even with the optimizations in the current libraries, we'll always need one extra copy of the request body right?
Regarding the content-length
header, it should work good if we can fallback to doing one copy when the header is not present.
The postgresql-pure
idea looks even better. If it takes a while to do it perhaps we can do the content-length
one(if it's simple) and work on postgresql-pure
as a next step?
Just to clarify. The above means that even with the optimizations in the current libraries, we'll always need one extra copy of the request body right?
Yes, with the current changes, we always read the body to memory in chunks (a lazy bytestring), and need to copy that to a single chunk to pass on to libpq. (Which is why I'm confused by the numbers in the memory tests -- I'd expect that we need to keep both the lazy and the strict bytestring in memory at the same time, so a peak memory use of 2x body size.)
Just tested v10 against the latest pre-release(v10.0.0.20221011, includes https://github.com/PostgREST/postgrest/pull/2349) using this benchmark setup with this k6 script(a bulk POST) and varying the number of virtual users(10, 50, 100).
data_received..............: 6.9 MB 216 kB/s
data_sent..................: 309 MB 9.7 MB/s
✓ failed requests............: 0.00% ✓ 0 ✗ 43073
http_req_blocked...........: avg=6.98µs min=1.23µs med=2.27µs max=2.71ms p(90)=3.65µs p(95)=5.29µs
http_req_connecting........: avg=3.11µs min=0s med=0s max=2.29ms p(90)=0s p(95)=0s
✓ http_req_duration..........: avg=6.08ms min=2.45ms med=5.84ms max=1.77s p(90)=8.23ms p(95)=9.03ms
http_req_receiving.........: avg=58.51µs min=14.5µs med=44µs max=3.79ms p(90)=61.58µs p(95)=84.56µs
http_req_sending...........: avg=55.05µs min=25.81µs med=51.35µs max=2.28ms p(90)=68.14µs p(95)=83.53µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=5.97ms min=2.3ms med=5.73ms max=1.77s p(90)=8.11ms p(95)=8.92ms
http_reqs..................: 43074 1353.110185/s
iteration_duration.........: avg=6.99ms min=3.24ms med=6.73ms max=1.77s p(90)=9.25ms p(95)=10.1ms
iterations.................: 43073 1353.078771/s
vus........................: 0 min=0 max=10
vus_max....................: 10 min=10 max=10
data_received..............: 8.1 MB 245 kB/s
data_sent..................: 350 MB 11 MB/s
✓ failed requests............: 1.52% ✓ 751 ✗ 48508
http_req_blocked...........: avg=6.73µs min=1.22µs med=2.21µs max=2.03ms p(90)=3.61µs p(95)=5.28µs
http_req_connecting........: avg=2.98µs min=0s med=0s max=680.92µs p(90)=0s p(95)=0s
✓ http_req_duration..........: avg=29.6ms min=3ms med=26.27ms max=2.95s p(90)=39.85ms p(95)=49.13ms
http_req_receiving.........: avg=54.57µs min=18.04µs med=44.07µs max=5.79ms p(90)=60.14µs p(95)=79.9µs
http_req_sending...........: avg=54.43µs min=28.01µs med=51.28µs max=4.87ms p(90)=66.9µs p(95)=80.96µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=29.49ms min=2.9ms med=26.15ms max=2.95s p(90)=39.74ms p(95)=49.02ms
http_reqs..................: 49260 1491.34639/s
iteration_duration.........: avg=30.51ms min=3.83ms med=27.15ms max=2.95s p(90)=40.79ms p(95)=50.16ms
iterations.................: 49259 1491.316115/s
vus........................: 0 min=0 max=50
vus_max....................: 50 min=50 max=50
data_received..............: 8.5 MB 226 kB/s
data_sent..................: 367 MB 9.7 MB/s
✓ failed requests............: 2.15% ✓ 1114 ✗ 50648
http_req_blocked...........: avg=13.88µs min=1.22µs med=2.2µs max=9.44ms p(90)=3.54µs p(95)=5.02µs
http_req_connecting........: avg=10.12µs min=0s med=0s max=9.28ms p(90)=0s p(95)=0s
✓ http_req_duration..........: avg=57.24ms min=3.65ms med=42.55ms max=7.59s p(90)=76.13ms p(95)=88.87ms
http_req_receiving.........: avg=54.14µs min=15.75µs med=44.81µs max=8.56ms p(90)=60.7µs p(95)=75.48µs
http_req_sending...........: avg=52.55µs min=26.55µs med=50.99µs max=2.41ms p(90)=66.26µs p(95)=80.64µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=57.14ms min=3.56ms med=42.44ms max=7.59s p(90)=76.02ms p(95)=88.75ms
http_reqs..................: 51763 1373.36774/s
iteration_duration.........: avg=58.13ms min=4.5ms med=43.46ms max=7.59s p(90)=77.01ms p(95)=89.73ms
iterations.................: 51762 1373.341208/s
vus........................: 0 min=0 max=100
vus_max....................: 100 min=100 max=100
data_received..............: 7.0 MB 217 kB/s
data_sent..................: 313 MB 9.7 MB/s
✓ failed requests............: 0.00% ✓ 0 ✗ 43681
http_req_blocked...........: avg=7.12µs min=1.23µs med=2.27µs max=3.24ms p(90)=3.66µs p(95)=5.49µs
http_req_connecting........: avg=3.2µs min=0s med=0s max=3.2ms p(90)=0s p(95)=0s
✓ http_req_duration..........: avg=6ms min=2.5ms med=5.78ms max=2.12s p(90)=7.91ms p(95)=8.67ms
http_req_receiving.........: avg=57.65µs min=16.92µs med=43.72µs max=4.34ms p(90)=60.55µs p(95)=83.78µs
http_req_sending...........: avg=55.41µs min=24.83µs med=51.44µs max=1.8ms p(90)=68.45µs p(95)=82.51µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=5.89ms min=2.32ms med=5.67ms max=2.12s p(90)=7.8ms p(95)=8.55ms
http_reqs..................: 43682 1356.964725/s
iteration_duration.........: avg=6.91ms min=3.25ms med=6.67ms max=2.12s p(90)=8.92ms p(95)=9.79ms
iterations.................: 43681 1356.933661/s
vus........................: 0 min=0 max=10
vus_max....................: 10 min=10 max=10
data_received..............: 8.1 MB 240 kB/s
data_sent..................: 356 MB 11 MB/s
✓ failed requests............: 0.71% ✓ 357 ✗ 49690
http_req_blocked...........: avg=7.47µs min=1.13µs med=2.21µs max=4.56ms p(90)=3.63µs p(95)=5.61µs
http_req_connecting........: avg=3.17µs min=0s med=0s max=3.03ms p(90)=0s p(95)=0s
✓ http_req_duration..........: avg=29.11ms min=4.96ms med=26.57ms max=3.61s p(90)=38.4ms p(95)=46.64ms
http_req_receiving.........: avg=55.68µs min=16.28µs med=43.85µs max=9.68ms p(90)=59.98µs p(95)=81.03µs
http_req_sending...........: avg=54.21µs min=27.98µs med=51.28µs max=6.65ms p(90)=66.91µs p(95)=82.28µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=29ms min=4.86ms med=26.46ms max=3.61s p(90)=38.28ms p(95)=46.52ms
http_reqs..................: 50048 1485.597503/s
iteration_duration.........: avg=30.04ms min=5.73ms med=27.47ms max=3.61s p(90)=39.35ms p(95)=47.57ms
iterations.................: 50047 1485.567819/s
vus........................: 0 min=0 max=50
vus_max....................: 50 min=50 max=50
data_received..............: 8.1 MB 233 kB/s
data_sent..................: 353 MB 10 MB/s
✓ failed requests............: 1.24% ✓ 620 ✗ 49134
http_req_blocked...........: avg=12.85µs min=1.2µs med=2.2µs max=8.09ms p(90)=3.61µs p(95)=5.19µs
http_req_connecting........: avg=9.09µs min=0s med=0s max=7.78ms p(90)=0s p(95)=0s
✓ http_req_duration..........: avg=59.51ms min=3.16ms med=44.94ms max=4.74s p(90)=74.99ms p(95)=90.09ms
http_req_receiving.........: avg=55.63µs min=19.82µs med=44.88µs max=29.96ms p(90)=60.25µs p(95)=77.31µs
http_req_sending...........: avg=53.33µs min=29.16µs med=51.37µs max=7.62ms p(90)=66.36µs p(95)=79.67µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=59.4ms min=2.76ms med=44.83ms max=4.74s p(90)=74.89ms p(95)=89.94ms
http_reqs..................: 49755 1427.870379/s
iteration_duration.........: avg=60.43ms min=6.29ms med=45.83ms max=4.74s p(90)=75.89ms p(95)=91.07ms
iterations.................: 49754 1427.84168/s
vus........................: 0 min=0 max=100
vus_max....................: 100 min=100 max=100
Looks like there's a slight improvement over the % of failed requests and throughput.
Also saw less %MEM on top
while doing the load tests. v10 reached %MEM 11.8 and v10.0.0.20221011 reached %MEM 9.8.
Using libpq (the C library) is what requires us to copy the full body to memory at all.
@robx Say if:
Would that allow us to have a single copy?
When running the memory tests:
The following profiling report is produced:
It seems the request body is copied 3 times in
sendQueryPrepared
,normalizedBody
,nullableValue
. Since we don't do any processing on the body, ideally it would just be copied once and then sent to PostgreSQL.