dank074 / Discord-video-stream

Experiment for making video streaming work for discord selfbots.
183 stars 35 forks source link

Optimize `parseNal` #53

Closed longnguyen2004 closed 10 months ago

longnguyen2004 commented 10 months ago

Profiling the library showed that parseNal is very inefficient. It was performing an O(n^2) search for the NAL start prefix code, and also doing a lot of allocations in the for loop. All of that in a very hot code path leads to poor performance.

I rewritten the function to use indexOf, and get rid of the O(n^2) loop along with the excessive allocation from the subarray() call.

Before ``` Statistical profiling result from isolate-0x5c006f0-186514-v8.log, (47617 ticks, 29409 unaccounted, 0 excluded). [Shared libraries]: ticks total nonlib name 558 1.2% /usr/lib/x86_64-linux-gnu/libc.so.6 31 0.1% /home/hp/.local/bin/node 27 0.1% /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 9 0.0% [vdso] [JavaScript]: ticks total nonlib name 4265 9.0% 9.1% JS: *parseNal /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/media/H264NalSplitter.js:60:13 342 0.7% 0.7% JS: *wasm-function[48] 132 0.3% 0.3% JS: *wasm-function[112] 105 0.2% 0.2% JS: *wasm-function[62] 78 0.2% 0.2% JS: *sendFrame /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/client/packet/VideoPacketizerH264.js:61:14 72 0.2% 0.2% JS: *Qr /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/node_modules/.pnpm/libsodium-wrappers@0.7.13/node_modules/libsodium-wrappers/dist/modules/libsodium-wrappers.js:1:79348 70 0.1% 0.1% JS: *concat node:buffer:576:32 44 0.1% 0.1% JS: *Socket.send node:dgram:576:33 35 0.1% 0.1% JS: *doSend node:dgram:677:16 --- lots of lines omitted for brevity --- [Bottom up (heavy) profile]: Note: percentage shows a share of a particular caller in the total amount of its parent calls. Callers occupying less than 1.0% are not shown. ticks parent name 29409 61.8% UNKNOWN 19959 67.9% JS: *parseNal /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/media/H264NalSplitter.js:60:13 18250 91.4% JS: *_transform /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/media/H264NalSplitter.js:103:15 17508 95.9% JS: ^Transform._write node:internal/streams/transform:170:38 17031 97.3% JS: *ondata node:internal/streams/readable:783:18 15735 92.4% JS: *Readable.read node:internal/streams/readable:421:35 ```
After ``` Statistical profiling result from isolate-0x6f926f0-187287-v8.log, (56686 ticks, 10312 unaccounted, 0 excluded). [Shared libraries]: ticks total nonlib name 889 1.6% /usr/lib/x86_64-linux-gnu/libc.so.6 33 0.1% /home/hp/.local/bin/node 22 0.0% /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 19 0.0% [vdso] [JavaScript]: ticks total nonlib name 927 1.6% 1.7% JS: *wasm-function[48] 381 0.7% 0.7% JS: *wasm-function[112] 249 0.4% 0.4% JS: *wasm-function[62] 183 0.3% 0.3% JS: *sendFrame /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/dist/client/packet/VideoPacketizerH264.js:61:14 167 0.3% 0.3% JS: *concat node:buffer:576:32 166 0.3% 0.3% JS: *Qr /home/hp/DiscordFFmpegRelay/node_modules/.pnpm/github.com+longnguyen2004+discord-video-stream@dc8d74761cfdb6c3747f6b126375e502bb26a4f2_discord.js-selfbot-v13@3.0.2/node_modules/@dank074/discord-video-stream/node_modules/.pnpm/libsodium-wrappers@0.7.13/node_modules/libsodium-wrappers/dist/modules/libsodium-wrappers.js:1:79348 106 0.2% 0.2% JS: *Socket.send node:dgram:576:33 --- lots of lines omitted for brevity --- [Bottom up (heavy) profile]: Note: percentage shows a share of a particular caller in the total amount of its parent calls. Callers occupying less than 1.0% are not shown. ticks parent name 38846 68.5% epoll_pwait@@GLIBC_2.6 10312 18.2% UNKNOWN 1519 14.7% JS: *doSend node:dgram:677:16 1519 100.0% JS: *afterDns node:dgram:662:20 1519 100.0% JS: *processTicksAndRejections node:internal/process/task_queues:67:35 27 1.8% JS: ^runNextTicks node:internal/process/task_queues:58:22 27 100.0% JS: *processTimers node:internal/timers:499:25 754 7.3% JS: *lookup node:dns:140:16 754 100.0% JS: *Socket.send node:dgram:576:33 ```

cc @aiko-chan-ai

longnguyen2004 commented 10 months ago

rbsp() also looks like a prime candidate for optimization, but it's not that big of a problem compared to parseNal()

dank074 commented 10 months ago

Good work