denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
97.33k stars 5.36k forks source link

Writing JS strings to resources, Deno.encode and Deno.encodeInto #15895

Open billywhizz opened 2 years ago

billywhizz commented 2 years ago

Issue

I was doing some investigation into network perf using Deno FFI and stdio perf in general and came up against a performance roadblock which is relevant to anywhere we are writing JS strings to ops resources using send or write syscalls and the like.

If you look at the benchmark script here you can see the following scenarios benchmarked. All scenarios write to /dev/null using FFI fast call.

Benchmarks

Results

results

Flamegraphs

Flamegraphs to compare the three static string scenarios are here which show clearly the overhead involved in both Deno.core.encode and Deno.core.ops.op_encoding_encode_into compared to the baseline of writing a pre-encoded buffer and using Deno.writeSync, which seems to be fastest option currently exposed to userland.

const tests = {
  static_buffer_writesync: () => Deno.writeSync(rid, payload),
  static_buffer_ffi: () => fs.write(fd, payload, payload.length),
  encode_dynamic_string: () => {
    const payload = encode(`${TEXT}${byteLength(str)}${CRLF2}${str}`)
    fs.write(fd, payload, payload.length)
  },
  encode_static_string: () => {
    const payload = encode(resp)
    fs.write(fd, payload, payload.length)
  },
  encode_into_dynamic_string: () => {
    const { written } = ops.op_encoding_encode_into(`${TEXT}${byteLength(str)}${CRLF2}${str}`, buf)
    fs.write(fd, buf, written)
  },
  encode_into_static_string: () => {
    const { written } = ops.op_encoding_encode_into(resp, buf)
    fs.write(fd, buf, written)
  }
}

static_buffer_writesync

static_buffer_writesync

static_buffer_ffi

static_buffer_ffi

encode_static_string

encode_static_string

encode_into_static_string

encode_into_static_string

Discussion

From the benchmarks we can see:

From the flamegraphs we can see:

There are a few of issues I think this raises:

  String::Utf8Value str(isolate, args[1]);
  args.GetReturnValue().Set(Integer::New(isolate, write(fd, 
    *str, str.length())));

Planned Actions

The disadvantage of the second approach is, afaik, it will not be optimised as a fast call because fast call api has no string support yet. But it would be worth knowing if it is faster than the other methods above even without the fast call so I propose doing it as a draft PR and see if it improves perf. It should be pretty easy to implement.

aapoalas commented 2 years ago

Don't you worry: Fast API can actually work with strings. If you give the parameter type as kValue then it'll accept any type (presumably) and gets passed as the V8 wrapper Value.

billywhizz commented 2 years ago

nice. has that landed already @aapoalas? i'll give it a go if so.

aapoalas commented 2 years ago

Yeah, it's already supported by Fast API internally. I can't quite remember if the support for that in ops was already added.

You'll need to use the options bag parameter as well to fall back to the slow path when the expected string parameter turns out to be not-a-string.

littledivy commented 2 years ago

For the record, here's the link to the conversation on V8 supporting strings in Fast API calls. https://discord.com/channels/684898665143206084/778060818046386176/1008754326933618728

billywhizz commented 2 years ago

i've updated the OP with a new test using Deno.writeSync, which is fastest "official" way I can find to write to a resource. Also uploaded nicer flamegraphs with full Deno/Rust stacktraces.

littledivy commented 2 years ago

After applying the encodeInto optimization in #15922

encodeInto_ptr

littledivy commented 2 years ago

The Builtins_LoadIC is concerning. I think its related to webidl?

billywhizz commented 2 years ago

I've implemented a little minimal runtime so i can test these low level ops changes without building all of deno. it only takes ~20 seconds to do a rebuild with this.

I implemented @littledivy's change from this PR for encode_into as well as a similar change with the same logic current Deno uses.

Divy's PR is ~3x the current implementation in throughput on a small http response payload. I will benchmark again over weekend using this in the benchmarks i did for text writing above. Screenshot from 2022-09-17 03-04-51

Here are the flamegraphs

current encode_into

deno_encode_into

my change based on divy's PR

runjs_encode_into

divy's PR change

runjs_encode_into_fast

billywhizz commented 2 years ago

btw - i just noticed the fast implementation is still using GetBackingStore. i'll have a look at that tomorrow. I might not be picking up latest rusty_v8. Screenshot from 2022-09-17 03-22-08

littledivy commented 2 years ago

@billywhizz you need to use deno_ops and deno_core from my PR branch. The GetBackingStore changes are unreleased.

littledivy commented 2 years ago

great results though!