denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
94.04k stars 5.23k forks source link

Slow performance in a very specific situation on Mac OS X #15820

Closed applmak closed 2 years ago

applmak commented 2 years ago

Hi! I'm trying to download my whole Google Photos library using its API. I'm also trying to use the fancy new Readable/WritableStream classes to do so, because they look like fun. My code looks roughly like this:

function readLibrary() {
  const libraryContentsUrl = new URL(
    "https://photoslibrary.googleapis.com/v1/mediaItems",
  );
  libraryContentsUrl.searchParams.set("pageSize", "100");
  let nextPageToken = "";
  return new ReadableStream<LibraryItem>({
    async pull(controller) {
      if (nextPageToken) {
        libraryContentsUrl.searchParams.set("pageToken", nextPageToken);
      }

      do {
        const libraryContentsReq = await fetch(libraryContentsUrl);
        const items = await libraryContentsReq.json();
        nextPageToken = items.nextPageToken;

        for (const item of items.mediaItems || []) {
          controller.enqueue(item);
        }
      } while (nextPageToken);

      if (!nextPageToken) {
        controller.close();
      }
    },
  }, {
    highWaterMark: 500,
  });
}

And then there's a writablestream that actually downloads each LibraryItem as a 32px x 32px thumbnail and passes the resulting ArrayBuffer to a pool of WebWorkers using postMessage, which do some post-processing on them.

Anyway, I noticed that the actual downloading of the images seemed to slow down over time. At first, I assumed some kind of rate-limiting from Google's image servers was the culprit, but a restart of the program resumes downloading the images at normal speed, which wouldn't make sense given a normal OAuth-token-based rate-limiting strategy. So, I took some data:

      console.time("image fetch");
      formattedBytes = await (await fetch(baseUrl)).arrayBuffer();
      console.timeEnd("image fetch");

And made a spreadsheet with the results. These data show that at the 385th fetch, the latency of the fetch operation increases significantly. I've left the program running for a day, and these fetches continue to slow down significantly, eventually taking well over a minute to return a small 32x32 thumbnail of the image.

Obviously, this slow latency means that I don't think I'll ever be able to download my whole library!

To help to figure out what's going on, I use the chrome inspector to take a profile. The resulting flamegraph showed small bursts of JS with almost no activity (except for program) in between. At the time I took the profile, these pauses were 6 seconds, but as I've said before, these continued to increase over time.

In order to see if it was v8 behaving badly, I built a recent-ish version of v8 in release mode, ran the program, and took a long profile. The results aren't particularly illuminating to me, but perhaps they are to you? Here are the C++ parts of the isolates (generated via for i in $(ls isolate-*.log); do echo $i; D8_PATH=v8/v8/out.gn/arm64.release ./v8/v8/tools/mac-tick-processor $i 2>/dev/null | grep -A 11 '\[C++\]'; done). They are totally dominated by the private symbol _open_dprotected_np, which I know nothing about.

Any hints from your end?

isolate-0x120018000-14970-v8.log
 [C++]:
   ticks  total  nonlib   name
  11890184  100.0%  100.0%  _open_dprotected_np
     14    0.0%    0.0%  v8::(anonymous namespace)::WebAssemblyModule(v8::FunctionCallbackInfo<v8::Value> const&)
     13    0.0%    0.0%  __os_nospin_lock_unlock_slow
     12    0.0%    0.0%  ___bsdthread_terminate
     12    0.0%    0.0%  _Builtins_CreateTypedArray
     11    0.0%    0.0%  _$LT$extern$u20$$u22$C$u22$$u20$fn$LP$A0$RP$$u20$.$GT$$u20$R$u20$as$u20$v8..support..CFnFrom$LT$F$GT$$GT$::mapping::c_fn::h86d2d7f5e5504cca
      9    0.0%    0.0%  _Builtins_TypedArrayPrototypeSubArray
      8    0.0%    0.0%  _sigprocmask
      8    0.0%    0.0%  _$LT$extern$u20$$u22$C$u22$$u20$fn$LP$A0$RP$$u20$.$GT$$u20$R$u20$as$u20$v8..support..CFnFrom$LT$F$GT$$GT$::mapping::c_fn::hfce26145d86bae17
      7    0.0%    0.0%  miniz_oxide::inflate::core::decompress::h54647326c002f7d6
isolate-0x140160000-14970-v8.log
 [C++]:
   ticks  total  nonlib   name
  11891858  100.0%  100.0%  _open_dprotected_np
     30    0.0%    0.0%  _mach_port_insert_member
     16    0.0%    0.0%  v8::(anonymous namespace)::WebAssemblyModule(v8::FunctionCallbackInfo<v8::Value> const&)
     14    0.0%    0.0%  __os_nospin_lock_unlock_slow
     13    0.0%    0.0%  _Builtins_CreateTypedArray
     13    0.0%    0.0%  _$LT$extern$u20$$u22$C$u22$$u20$fn$LP$A0$RP$$u20$.$GT$$u20$R$u20$as$u20$v8..support..CFnFrom$LT$F$GT$$GT$::mapping::c_fn::hfce26145d86bae17
     12    0.0%    0.0%  _sigprocmask
     12    0.0%    0.0%  _$LT$extern$u20$$u22$C$u22$$u20$fn$LP$A0$RP$$u20$.$GT$$u20$R$u20$as$u20$v8..support..CFnFrom$LT$F$GT$$GT$::mapping::c_fn::h1ec9347500c1840d
      9    0.0%    0.0%  tokio::time::driver::_$LT$impl$u20$tokio..time..driver..handle..Handle$GT$::process_at_time::h45cc5965bf5f48bd
      8    0.0%    0.0%  ___bsdthread_terminate
isolate-0x1401e0000-14970-v8.log
 [C++]:
   ticks  total  nonlib   name
  11895842  100.0%  100.0%  _open_dprotected_np
     30    0.0%    0.0%  _mach_port_insert_member
     15    0.0%    0.0%  _Builtins_CreateTypedArray
     13    0.0%    0.0%  v8::(anonymous namespace)::WebAssemblyModule(v8::FunctionCallbackInfo<v8::Value> const&)
     11    0.0%    0.0%  __os_nospin_lock_unlock_slow
     11    0.0%    0.0%  _$LT$extern$u20$$u22$C$u22$$u20$fn$LP$A0$RP$$u20$.$GT$$u20$R$u20$as$u20$v8..support..CFnFrom$LT$F$GT$$GT$::mapping::c_fn::hfce26145d86bae17
     10    0.0%    0.0%  _sigprocmask
      9    0.0%    0.0%  ___bsdthread_terminate
      8    0.0%    0.0%  _Builtins_LoadIC
      7    0.0%    0.0%  __kernelrpc_mach_port_move_member_trap
isolate-0x140248000-14970-v8.log
 [C++]:
   ticks  total  nonlib   name
  11890603  100.0%  100.0%  _open_dprotected_np
     15    0.0%    0.0%  _$LT$extern$u20$$u22$C$u22$$u20$fn$LP$A0$RP$$u20$.$GT$$u20$R$u20$as$u20$v8..support..CFnFrom$LT$F$GT$$GT$::mapping::c_fn::hfce26145d86bae17
     11    0.0%    0.0%  v8::internal::wasm::memory_copy_wrapper(unsigned long)
     11    0.0%    0.0%  _Builtins_CreateTypedArray
     10    0.0%    0.0%  v8::(anonymous namespace)::WebAssemblyModule(v8::FunctionCallbackInfo<v8::Value> const&)
     10    0.0%    0.0%  __os_nospin_lock_unlock_slow
      9    0.0%    0.0%  __os_once_gate_wait
      8    0.0%    0.0%  tokio::runtime::basic_scheduler::Context::park::h2795d876476853b2
      8    0.0%    0.0%  _Builtins_LoadIC
      7    0.0%    0.0%  tokio::io::driver::registration::Registration::poll_ready::h0e5a6541b1cbd311 
littledivy commented 2 years ago

I cannot reproduce this on OSX by just doing a simple fetch burst:

let count = 0;
while(true) {
   const start = performance.now();
   const r = await fetch("http://localhost:9000/img.png");
   await r.arrayBuffer();
   const end = performance.now() - start;
   console.log(`fetch ${count} took ${end}ms`);
   count++;
}

So I think the slowdown is related to:

passes the resulting ArrayBuffer to a pool of WebWorkers using postMessage

littledivy commented 2 years ago

Hmm cannot reproduce with this too:

const source = `
let b = []
onmessage = function(e, a) {
  b.push(e.data);  
}
`
const worker = new Worker(URL.createObjectURL(new Blob([source], {type: 'application/javascript'})), { type: 'module' });
let count = 0;
while(true) {
   const start = performance.now();
   const r = await fetch("http://localhost:9000/img.png");
   const buf = await r.arrayBuffer();

   worker.postMessage(buf, [buf]);
   const end = performance.now() - start;
   console.log(`fetch ${count} took ${end}ms`);
   count++;
}
applmak commented 2 years ago

Here's some more code. I'll work to create a tinier test case when I am able to:

const downloadImageTransformer = new WritableStream<LibraryItem>({
  async write(item) {
    const baseUrl = `${item.baseUrl}=w32-h32-d`;
    let formattedBytes;
    try {
      console.time("image fetch");
      formattedBytes = await (await fetch(baseUrl)).arrayBuffer();
      console.timeEnd("image fetch");
    } catch (e) {
      return;
    }
    const worker = workers[nextWorker = (nextWorker + 1) % workers.length];
    worker.postMessage({ imageData: formattedBytes, id: item.id }, [
      formattedBytes,
    ]);
  },
}, {
  highWaterMark: 500,
});

And then the worker:

import { Image } from "https://deno.land/x/imagescript/mod.ts";

interface ImageMessage {
  id: string;
  imageData: ArrayBuffer;
}

async function reportHash(id, image: Image) {
  const hashBuffer = await crypto.subtle.digest("SHA-1", image.bitmap);
  const hash = Array.from(new Uint8Array(hashBuffer)).map((b) =>
    b.toString(16).padStart(2, "0")
  ).join("");
  self.postMessage({ id, hash });
}

self.onmessage = async (evt) => {
  const { imageData, id } = evt.data as ImageMessage;
  let image;
  try {
    image = await Image.decode(imageData);
  } catch (e) {
    console.error(e);
    return;
  }
  await reportHash(id, image);
  image.rotate(90);
  await reportHash(id, image);
  image.rotate(90);
  await reportHash(id, image);
  image.rotate(90);
  await reportHash(id, image);
};

self.postMessage("ready");
applmak commented 2 years ago

Okay, when I remove the Worker, I see the same slowdown.

applmak commented 2 years ago

When I remove the WritableStream and just do the fetch in the ReadableStream, I no longer see the slowdown.

applmak commented 2 years ago

And now, when I added the WritableStream back in, I no longer see the slowdown.

Given that this works, it's probably not worth spending time chasing this down much more. If I get this to repro again, I'll reopen the bug. Thanks for spending time on it!