Example outputs at double speed

robinovitch61 commented 2 years ago

When I run the demo tape:

# Where should we write the GIF?
Output demo.gif

# Set up a 1200x600 terminal with 46px font.
Set FontSize 46
Set Width 1200
Set Height 600

# Type a command in the terminal.
Type "echo 'Welcome to VHS!'"

# Pause for dramatic effect...
Sleep 500ms

# Run the command by pressing enter.
Enter

# Admire the output for a bit.
Sleep 5s

I get the following, which seems to be at 2x speed compared to the example in the README: demo

maaslalani commented 2 years ago

I can't reproduce this on my machine, but I think it has something to do with the number of frames that the VHS instance is able to capture. Can you try setting the framerate to 24 (the default is 60)

Set Framerate 24

maaslalani commented 2 years ago

To explain this bug further, VHS will try to capture a frame every 1/framerate seconds (by default this is every 16.7 milliseconds) however if the capturing process takes longer, let's say 30 milliseconds per frame then VHS won't have enough frames to render the GIF at 60 FPS, but it will still try to. So it assumes that we are using 60 FPS when we really only have 30 FPS and results in a sped up GIF.

One solution is to time how long the frame capture takes (on average) and then use that as the actual frame rate so that the GIF isn't sped up.

robinovitch61 commented 2 years ago

That looks better!

# Where should we write the GIF?
Output demo.gif

# Set up a 1200x600 terminal with 46px font.
Set FontSize 46
Set Width 1200
Set Height 600
Set Framerate 24

# Type a command in the terminal.
Type "echo 'Welcome to VHS!'"

# Pause for dramatic effect...
Sleep 500ms

# Run the command by pressing enter.
Enter

# Admire the output for a bit.
Sleep 5s

Results in

demo

Any ideas why it might be slower to capture a frame? Here are some system/environment stats

❯ ffmpeg -version
ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.102)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
libavutil      57. 28.100 / 57. 28.100
libavcodec     59. 37.100 / 59. 37.100
libavformat    59. 27.100 / 59. 27.100
libavdevice    59.  7.100 / 59.  7.100
libavfilter     8. 44.100 /  8. 44.100
libswscale      6.  7.100 /  6.  7.100
libswresample   4.  7.100 /  4.  7.100
libpostproc    56.  6.100 / 56.  6.100

❯ ttyd --version
ttyd version 1.7.2-e8728bb

❯ vhs -v
vhs version v0.1.1

One solution is to time how long the frame capture takes (on average) and then use that as the actual frame rate so that the GIF isn't sped up.

Would I do this by counting the number of frames and dividing by the gif length or something?

maaslalani commented 2 years ago

I have no idea why frames would be capturing slowly on that machine, it seems to be very fast. Do you have a lot of applications open like chrome, slack, VS Code etc... while recording the GIF?

Would I do this by counting the number of frames and dividing by the gif length or something?

We have some logic to see how long a frame capture took so that we can sleep for the rest of the time:

https://github.com/charmbracelet/vhs/blob/1508f495c2016d7364244924fc4123b70c13f55b/vhs.go#L199

So we would probably measure each frame and average the number of frames we captured each second.

robinovitch61 commented 2 years ago

I have no idea why frames would be capturing slowly on that machine, it seems to be very fast. Do you have a lot of applications open like chrome, slack, VS Code etc... while recording the GIF?

I quit everything so activity monitor looked like this

During vhs < demo.tape running, it got up to this

But still doesn't look overloaded! Very strange.

FWIW the dockerized run works for me, so that can be my workaround for now :)

docker run --rm -v $PWD:/vhs ghcr.io/charmbracelet/vhs demo.tape

maaslalani commented 2 years ago

Interesting, so in docker you're able to get 60 frames per second? That's definitely really strange.

robinovitch61 commented 2 years ago

Dang I was hoping that was the case but just got this on my other machine too

❯ vhs < demo.tape
Output .gif demo.gif
Set FontSize 32
Set Width 1200
Set Height 600
Type echo 'Welcome to VHS!'
Sleep 500ms
Enter 1
Sleep 5s
Creating GIF...
Time: 0h:00m:16s

demo

So might be a mac thing?

maaslalani commented 2 years ago

This is super strange, I will definitely look into this. I don't know if it's a Mac thing, I have a mac and things work correctly for me.

I'll try and see if there's anything funky going on. It's really strange that it's not a memory/cpu issue. But that also means it could be solvable!

maaslalani commented 2 years ago

Really appreciate all the info you've given, it's super helpful! I'll try and see if there's something is going wrong in VHS.

robinovitch61 commented 2 years ago

Ok thanks! Extra bits of info is that I installed vhs via brew on the intel mac and via go on the m2 mac. On both macs I installed the ffmpeg and ttyd deps via brew as per the README. I have Brave Browser set as my system default on both (not sure it matters at all).

maaslalani commented 2 years ago

This might be a super long shot, but what happens if you set chrome to your default. VHS uses a chromium browser and since brave is chromium it might be using that. I have no idea though, this is a long shot.

robinovitch61 commented 2 years ago

No luck - installed latest chrome, set as default, restarted computer, same output from tape

maaslalani commented 2 years ago

No luck - installed latest chrome, set as default, restarted computer, same output from tape

Gotcha, it was a long shot. Really appreciate you trying it out and ruling that possibility out!

muesli commented 2 years ago

Here's another long shot: are your devices battery powered when running vhs? In other words: might another energy profile limit rendering performance of the remote controlled browser instance?

ysmood commented 2 years ago

I don't know if #110 can solve it or not.

robinovitch61 commented 2 years ago

Here's another long shot: are your devices battery powered when running vhs? In other words: might another energy profile limit rendering performance of the remote controlled browser instance?

They weren't plugged in, but unfortunately same result plugged in!

I don't know if https://github.com/charmbracelet/vhs/pull/110 can solve it or not.

Thanks for the help @ysmood ! I built off https://github.com/charmbracelet/vhs/commit/e3691162b0b968726cacb6762c1f3c96ef4bf185 and tried various framerates, but unfortunately there are still differing playback speeds.

From the outputs below, speed peaks around 50fps, the default setting. If the bug is due to not being able to capture frames quickly enough, having too few total frames, and then trying to render the gif at the assumed framerate resulting in a sped up gif as I think I've correctly interpreted from your comment here, then it's strange that it slows down again above 50fps, right?

Also all the output gifs are about the same size (30-32kB), which might be expected, but is also interesting.

All of them were generated like this (70fps as example)

Output examples/70.gif

Require echo

Set FontSize 32
Set Width 1200
Set Height 600
Set Framerate 70

Type "echo 'Welcome to VHS!' 70fps"  Sleep 500ms  Enter

Sleep 5s

jghauser commented 2 years ago

I have the same problem on Arch Linux (with sway). Also on a relatively fast system (Intel i7-1065G7).

robinovitch61 commented 1 year ago

Revisiting this - after upgrading to MacOS Ventura, things look good!

# Where should we write the GIF?
Output demo.gif

# Set up a 1200x600 terminal with 46px font.
Set FontSize 46
Set Width 1200
Set Height 600

# Type a command in the terminal.
Type "echo 'Welcome to VHS!'"

# Pause for dramatic effect...
Sleep 500ms

# Run the command by pressing enter.
Enter

# Admire the output for a bit.
Sleep 5s

Gives demo

Same versions as before

❯ ffmpeg -version
ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.102)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
libavutil      57. 28.100 / 57. 28.100
libavcodec     59. 37.100 / 59. 37.100
libavformat    59. 27.100 / 59. 27.100
libavdevice    59.  7.100 / 59.  7.100
libavfilter     8. 44.100 /  8. 44.100
libswscale      6.  7.100 /  6.  7.100
libswresample   4.  7.100 /  4.  7.100
libpostproc    56.  6.100 / 56.  6.100

❯ ttyd -version
ttyd version 1.7.2-e8728bb

❯ vhs -v
vhs version 0.1.1

robinovitch61 commented 1 year ago

Ah, nevermind, when I increase the width and height of the terminal, it gets much worse. Even changing the framerate doesn't help then.

Jonathan-Zollinger commented 1 year ago

I'm seeing this same thing in my environment. which like robinovitch61 said, seems to be exacerbated with the output size.

this is set to 14s sleep but you can maven reports less than 8 seconds for compile time. demo

demo.tape

```sh Output demo .gif Set Theme "Catppuccin Mocha" Set FontSize 32 Set Width 2400 Set Height 900 Set Framerate 24 Type "mvn clean install" Sleep 250ms Enter Sleep 14s ```

I'm running on a fedora37 VM with kitty - though I'm ssh'ing into the terminal from a windows box if that matters. My ttyd install was done through brew.

Software	Version
VM Environment
OS	`Fedora 37`
ttyd (installed via brew)	`1.7.3`
ffmpeg (installed via dnf)	```sh libavutil 57. 28.100 / 57. 28.100 libavcodec 59. 37.100 / 59. 37.100 libavformat 59. 27.100 / 59. 27.100 libavdevice 59. 7.100 / 59. 7.100 libavfilter 8. 44.100 / 8. 44.100 libswscale 6. 7.100 / 6. 7.100 libswresample 4. 7.100 / 4. 7.100 libpostproc 56. 6.100 / 56. 6.100 ```
vhs (installed via yum repo)	`version v0.1.0` (d6bba9f)

ffmpeg configuration


ffmpeg version 5.1.3 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12 (GCC)
configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --docdir=/usr/share/doc/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=x86_64 --optflags='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' --extra-ldflags='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --enable-version3 --enable-libsmbclient --disable-openssl --enable-bzlib --enable-frei0r --enable-chromaprint --enable-gcrypt --enable-gnutls --enable-ladspa --enable-lcms2 --enable-libshaderc --enable-vulkan --disable-cuda-sdk --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdc1394 --enable-libdrm --enable-libfdk-aac --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libiec61883 --enable-libilbc --enable-libjack --enable-libjxl --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopenh264-dlopen --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libplacebo --enable-libpulse --enable-librabbitmq --enable-librav1e --enable-librist --enable-librsvg --enable-librubberband --enable-libsnappy --enable-libsvtav1 --enable-libsoxr --enable-libspeex --enable-libssh --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libv4l2 --enable-libvpx --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lto --enable-libmfx --enable-lv2 --enable-vaapi --enable-vdpau --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libxvid --enable-openal --enable-opencl --enable-opengl --enable-pthreads --enable-vapoursynth --enable-muxers --enable-demuxers --enable-hwaccels --disable-encoders --disable-decoders --disable-decoder='h264,hevc,vc1' --enable-encoder=',a64multi,a64multi5,aac,libfdk_aac,ac3,adpcm_adx,adpcm_argo,adpcm_g722,adpcm_g726,adpcm_g726le,adpcm_ima_alp,adpcm_ima_amv,adpcm_ima_apm,adpcm_ima_qt,adpcm_ima_ssi,adpcm_ima_wav,adpcm_ima_ws,adpcm_ms,adpcm_swf,adpcm_yamaha,alac,alias_pix,amv,apng,ass,asv1,asv2,ayuv,bitpacked,bmp,cinepak,cljr,dca,dfpwm,dnxhd,dpx,dvbsub,dvdsub,dvvideo,exr,ffv1,ffvhuff,flac,flashsv,flashsv2,flv,g723_1,gif,h261,h263,h263_v4l2m2m,h263p,h264_amf,h264_nvenc,h264_qsv,h264_v4l2m2m,h264_vaapi,hap,hevc_amf,hevc_nvenc,hevc_qsv,hevc_v4l2m2m,hevc_vaapi,huffyuv,ilbc,jpegls,jpeg2000,libaom,libaom_av1,libcodec2,libgsm,libgsm_ms,libilbc,libjxl,libmp3lame,libopencore_amrnb,libopenh264,libopenjpeg,libopus,librav1e,libschroedinger,libspeex,libsvtav1,libtheora,libtwolame,libvo_amrwbenc,libvorbis,libvpx_vp8,libvpx_vp9,libwebp,libwebp_anim,libxvid,mjpeg,mjpeg_qsv,mjpeg_vaapi,mlp,mp2,mp2fixed,mpeg1video,mpeg2video,mpeg2_qsv,mpeg2_vaapi,mpeg4,mpeg4_v4l2m2m,msmpeg4v2,msmpeg4v3,msvideo1,nellymoser,opus,pam,pbm,pcm_alaw,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s8,pcm_s8_planar,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcx,pgm,pgmyuv,phm,png,ppm,qoi,qtrle,r10k,r210,ra_144,rawvideo,roq,roq_dpcm,rpza,rv10,rv20,s302m,sbc,sgi,smc,snow,sonic,sonic_ls,speedhq,srt,ssa,subrip,sunrast,svq1,targa,text,tiff,truehd,tta,ttml,utvideo,v210,v308,v408,v410,vc1_qsv,vc1_v4l2m2m,vc2,vorbis,vp8_qsv,vp8_v4l2m2m,vp8_vaapi,vp9_qsv,vp9_vaapi,wavpack,webvtt,wmav1,wmav2,wmv1,wmv2,wrapped_avframe,xbm,xface,xsub,xwd,y41p,yuv4,zlib,zmbv,' --enable-decoder=',aac,aasc,libfdk_aac,ac3,acelp_kelvin,adpcm_4xm,adpcm_adx,adpcm_afc,adpcm_agm,adpcm_aica,adpcm_argo,adpcm_ct,adpcm_dtk,adpcm_ea,adpcm_ea_maxis_xa,adpcm_ea_r1,adpcm_ea_r2,adpcm_ea_r3,adpcm_ea_xas,adpcm_g722,adpcm_g726,adpcm_g726le,adpcm_ima_acorn,adpcm_ima_alp,adpcm_ima_amv,adpcm_ima_apc,adpcm_ima_apm,adpcm_ima_cunning,adpcm_ima_dat4,adpcm_ima_dk3,adpcm_ima_dk4,adpcm_ima_ea_eacs,adpcm_ima_ea_sead,adpcm_ima_iss,adpcm_ima_moflex,adpcm_ima_mtf,adpcm_ima_oki,adpcm_ima_qt,adpcm_ima_qt_at,adpcm_ima_rad,adpcm_ima_smjpeg,adpcm_ima_ssi,adpcm_ima_wav,adpcm_ima_ws,adpcm_ms,adpcm_mtaf,adpcm_psx,adpcm_sbpro_2,adpcm_sbpro_3,adpcm_sbpro_4,adpcm_swf,adpcm_thp,adpcm_thp_le,adpcm_vima,adpcm_xa,adpcm_yamaha,adpcm_zork,alac,alias_pix,amrnb,amrwb,amv,anm,ansi,ape,apng,arbc,argo,ass,asv1,asv2,atrac1,atrac3,atrac3al,atrac3p,atrac3pal,aura,aura2,av1,av1_qsv,ayuv,bethsoftvid,bfi,bink,binkaudio_dct,binkaudio_rdft,bintext,bitpacked,bmp,bmv_audio,bmv_video,brender_pix,c93,ccaption,cdgraphics,cdtoons,cdxl,cinepak,clearvideo,cljr,cook,cpia,cscd,cyuv,dca,dds,derf_dpcm,dfa,dfpwm,dirac,dnxhd,dolby_e,dpx,dsd_lsbf,dsd_msbf,dsicinaudio,dsicinvideo,dss_sp,dvaudio,dvbsub,dvdsub,dvvideo,dxa,dxtory,eacmv,eamad,eatgq,eatgv,eatqi,eightbps,eightsvx_exp,eightsvx_fib,escape124,escape130,evrc,exr,ffv1,ffvhuff,ffwavesynth,fits,flac,flashsv,flashsv2,flic,flv,fmvc,fourxm,g723_1,g729,gdv,gem,gif,gremlin_dpcm,gsm,gsm_ms,gsm_ms_at,h261,h263,h263_v4l2m2m,h263i,h263p,hap,hca,hcom,hnm4_video,hq_hqa,hqx,huffyuv,hymt,iac,idcin,idf,iff_ilbm,ilbc,imc,indeo2,indeo3,indeo4,indeo5,interplay_acm,interplay_dpcm,interplay_video,ipu,jacosub,jpeg2000,jpegls,jv,kgv1,kmvc,lagarith,libaom,libaom_av1,libcodec2,libdav1d,libgsm,libgsm_ms,libilbc,libjxl,libopencore_amrnb,libopencore_amrwb,libopenh264,libopenjpeg,libopus,librsvg,libschroedinger,libspeex,libvorbis,libvpx_vp8,libvpx_vp9,libzvbi_teletext,loco,lscr,m101,mace3,mace6,mdec,metasound,microdvd,mimic,mjpeg,mjpeg_qsv,mjpegb,mlp,mmvideo,motionpixels,mp1,mp1float,mp2,mp2float,mp3,mp3adu,mp3adufloat,mp3float,mp3on4,mp3on4float,mpc7,mpc8,mpeg1video,mpeg1_v4l2m2m,mpeg2video,mpeg2_qsv,mpeg2_v4l2m2m,mpeg4,mpeg4_v4l2m2m,mpegvideo,mpl2,msa1,mscc,msmpeg4v1,msmpeg4v2,msmpeg4v3,msnsiren,msp2,msrle,mss1,mss2,msvideo1,mszh,mts2,mv30,mvc1,mvc2,mvdv,mvha,mwsc,mxpeg,nellymoser,nuv,on2avc,opus,paf_audio,paf_video,pam,pbm,pcm_alaw,pcm_bluray,pcm_dvd,pcm_f16le,pcm_f24le,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_lxf,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24daud,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s64be,pcm_s64le,pcm_s8,pcm_s8_planar,pcm_sga,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcm_vidc,pcx,pfm,pgm,pgmyuv,pgssub,pgx,phm,photocd,pictor,pjs,png,ppm,prosumer,psd,ptx,qcelp,qdm2,qdmc,qdraw,qoi,qpeg,qtrle,r10k,r210,ra_144,ra_288,rasc,rawvideo,realtext,rl2,roq,roq_dpcm,rpza,rscc,rv10,rv20,s302m,sami,sanm,sbc,screenpresso,sdx2_dpcm,sgi,sgirle,shorten,simbiosis_imx,sipr,siren,smackaud,smacker,smc,smvjpeg,snow,sol_dpcm,sonic,sp5x,speedhq,speex,srgc,srt,ssa,stl,subrip,subviewer,subviewer1,sunrast,svq1,svq3,tak,targa,targa_y216,tdsc,text,theora,thp,tiertexseqvideo,tiff,tmv,truehd,truemotion1,truemotion2,truemotion2rt,truespeech,tscc,tscc2,tta,twinvq,txd,ulti,utvideo,v210,v210x,v308,v408,v410,vb,vble,vcr1,vmdaudio,vmdvideo,vmnc,vorbis,vp3,vp4,vp5,vp6,vp6a,vp6f,vp7,vp8,vp8_qsv,vp8_v4l2m2m,vp9,vp9_qsv,vp9_v4l2m2m,vplayer,vqa,wavpack,wcmv,webp,webvtt,wmav1,wmav2,wmavoice,wmv1,wmv2,wnv1,wrapped_avframe,ws_snd1,xan_dpcm,xan_wc3,xan_wc4,xbin,xbm,xface,xl,xpm,xsub,xwd,y41p,ylc,yop,yuv4,zero12v,zerocodec,zlib,zmbv,'

mikelorant commented 8 months ago

After a weekend of working on this problem, I have a solution to this issue.

I firstly need to explain a few things and why this has been such a difficult problem to diagnose.

Why

The default frame rate configured in VHS is 50. This is already a problematic value because to maintain this frame rate we need to be capturing frames in 1s (1000ms) / 50 frames = 20ms. So we fire off requests to go-rod to request the browser to send image data. We unfortunately have to do this twice - once for the text, a second time for the cursor. Then we need to write these images. This is a total of 4 operations that need to be completed in 20ms. We have a budget of 10ms to read the text and write, and 10ms to read the cursor and write.

It gets even worse, these requests are blocking calls. We can't send key press requests while we are waiting for the images. They are queued up.

So what happens if we can't do these 4 actions in 20ms? We take as long as needed and then immediately attempt to get the next set of frames. No downtime here! However, now we are a delayed frame. We didn't trigger 20ms after the last one, we may have triggered 40ms instead. So we are in some ways dropping frames, we collect less frames than we expect.

When it comes to assembling these frames in a video, it is simply joining them all together and saying each frame is 20ms apart or 50fps. But hold on... we dropped frames? If we were meant to collect 400 frames, but only got 200 frames, we have a problem. We send all this to ffmpeg and tell it to assemble them at 50fps but we only supply half the frames it needs. It assume the video is shorter and therefore it ends up playing back twice as fast because it had it set to 50fps.

Improvements

There are lots of little improvements we can do to improve this situation. This is what I've done and experimented with:

Lower default frame rate to 24. Half the frame rate = twice the budget to collect frames. Easy win and for capturing text it seems unnecessary to record faster.
Go routines for the write files. No negatives and and we save about 4ms for 2 files and any weird disk issues don't impact the next capture event.
Combined capture of the text and cursor (to save an image get). This can be done using go-rod page screenshot function. Sure, the cursor might not take that long, but every bit counts. So far working out well. Some complexities here, so not a free win.
Detailed output when frames are delayed or when we have exceeded our save image budget. This was critical in identifying the problem. You can't fix what you can't measure.

Here is an output of the demo.tape with some metrics so you can clearly see the problem.

❯ go run . examples/demo.tape
File: examples/demo.tape
Host your GIF on vhs.charm.sh: vhs publish <file>.gif
Output .gif examples/demo.gif
Require echo
Set Shell bash
Set FontSize 32
Set Width 1200
Set Height 600
Set Framerate 100
Type echo 'Welcome to VHS!'
WARN: Exceed Budget, Duration:   38ms (Get text) [0]
WARN: Delayed Next, Delay:   28ms [0]
WARN: Exceed Budget, Duration:   20ms (Get text) [1]
WARN: Delayed Next, Delay:   10ms [1]
...
WARN: Exceed Budget, Duration:   10ms (Get text) [99]
WARN: Delayed Next, Delay:    0ms [99]
Sleep 500ms
Enter 1
WARN: Exceed Budget, Duration:   11ms (Get text) [157]
WARN: Delayed Next, Delay:    1ms [157]
Sleep 5s
WARN: Exceed Budget, Duration:   10ms (Get text) [599]
WARN: Delayed Next, Delay:    0ms [599]
WARN: Exceed Budget, Duration:   11ms (Get text) [605]
WARN: Delayed Next, Delay:    1ms [605]
WARN: Exceed Budget, Duration:   10ms (Get text) [608]
WARN: Delayed Next, Delay:    0ms [608]
Delayed Next Frames: 25, Total Frames: 638 Expected Frames: 683
Exceeded Budget Events: 25, Total Events: 2552, Expected Events: 2732

This is a recording at 100fps and what the exceed budget tells me is that I was 11ms too slow to get the text image. I only have a budget of 10ms but I took 21ms. Now I forced the next frame to be taken 11ms late. I track how often these exceed budgets occur. They aren't the main concern though, it is only when we have delayed frames does this turn into a problem.

I also know I have less frames than expected (likely around 50 but it doesn't add up exactly).

As you can see having metrics helps considerably. Running this in a container shows how poor performing Chrome is when using software rendering instead of hardware accelerated (250ms to get an image).

Overall, these improvements helped but still did not address the root cause of the problem.

Solution

The proper solution to this issue is to stop assuming the frames are received at their expected interval and start tracking exactly when a frame is captured. ffmpeg has a feature using the concat filter to join images together based on a text file that contains an image filename and a timestamp offset. The offset means we no longer need ffmpeg to assume that the frames are equally spaced, it knows exactly when each frame should be displayed.

Here is a sample of what this looks like:

file 'frame-text-00000.png'
outpoint 00:00:00.019
file 'frame-text-00001.png'
outpoint 00:00:00.049
file 'frame-text-00002.png'
outpoint 00:00:00.047
file 'frame-text-00003.png'
outpoint 00:00:00.033
file 'frame-text-00004.png'
outpoint 00:00:00.032
file 'frame-text-00005.png'
outpoint 00:00:00.032

The outpoint defines how long after displaying the previous image to move onto the next image. These offsets will become further apart as there are less frames.

Be aware, we can't magically make it that the frame rate you request will have all frames captured. That comes down to the performance of the hardware used for recording. However, we can tell the user that they are dropping frames. This knowledge will allow them to tune their tape settings to match their hardware capabilities. Assembling the frames we do have using the offset will mask this problem and prevent the recording from being sped up. This give us a consistent speed video and allows us to solve issues like these.

@maaslalani Will take me a few days to begin to put up the pull requests that address the improvements I mentioned. Still also need to figure out how to unit test this as well.

robinovitch61 commented 8 months ago

@mikelorant thanks for this excellent analysis! Those improvements in performance and visibility sound great

mikelorant commented 8 months ago

Addressing a few different comments with this thread:

One solution is to time how long the frame capture takes (on average) and then use that as the actual frame rate so that the GIF isn't sped up.

This would never work because not all parts of the recording capture at the same frame rate. Depending on complexity of the image, the rendering time is significantly different. A mostly blank screen with a few lines of text encodes far quicker than a very busy screen filled with many colours and characters.

Interesting, so in docker you're able to get 60 frames per second? That's definitely really strange.

This is because there is a switch from Chrome using a hardware renderer to a software rendering. We switching from GPU to CPU. In general software rendering is terrible in containers and we should highly discourage it. However in some cases it can improve performance if the hardware GPU is busy or has weak performance.

then it's strange that it slows down again above 50fps

This one took me a while to really understand.

Grabbing images is done by using the go-rod method CanvasToImage. This maps back to a browser function toDataUrl. This function is problematic and an improved function toBlob now exists because of some of the issues. The main issue that matters to us is that this is a blocking function. While we are requesting an image we can't do anything else. This includes being unable to send key presses. We have a few image requests queued up and we want to send a key press, it will have to wait. Now we have slowed down our ability to send keys when we want them causing the video to be "slow". Very obvious when sending lots of typing text.

I've been experimenting with some ideas to solve this problem, but ttyd is making this difficult because while I can open multiple tabs in our headless Chrome, I can't open multiple pages that map back to the same ttyd session. Having multiple tabs to the same ttyd session would allow me to send key presses to one tab and record images from the other. I'm probably going to experiment with the idea of using a shared tmux session between multiple tabs and see if it works well.

Ah, nevermind, when I increase the width and height of the terminal, it gets much worse. Even changing the framerate doesn't help then.

Making the terminal larger just increases the size of the image and increases the latency for image generation. Lowering the frame rate will help but in most cases you need it to drop to single digits which really doesn't look great. What matters most is finding out how long it takes it get and write a complex image from the terminal. How many milliseconds that takes is what will determine what a reasonable frame rate can be set.

mikelorant commented 8 months ago

@ysmood Having some challenges making some improvements especially around trying to reduce the cost of capturing the frames.

The current way of taking a screenshot, which has to be done for both the text and cursor is:

text, textErr := vhs.TextCanvas.CanvasToImage("image/png", quality)

As I understand it, this internally translates in rod to sending toDataUrl which sends back the png output.

Instead of capturing individually the two canvas elements of xterm.js, it seems better to do the following:

req := proto.PageCaptureScreenshot{
  Format: proto.PageCaptureScreenshotFormatPng,
  OptimizeForSpeed: true,
}

text, textErr := vhs.Page.Screenshot(true, &req)

This would allow us to get both the text and cursor combined. However this one method takes 3x longer than doing a single CanvasToImage. This seems to translate to a message via CDP as Page.captureScreenshot. I am guessing this may not be hardware accelerated compared with asking Chrome directly for CanvasToImage.

I'll quickly mention, I know OptimizeForSpeed does nothing yet. Hopefully it will though 😄

Do you know why the performance is significantly less? Are there any other options to capturing the screen in one go?

Currently we define the two canvases as:

vhs.TextCanvas, _ = vhs.Page.Element("canvas.xterm-text-layer")
vhs.CursorCanvas, _ = vhs.Page.Element("canvas.xterm-cursor-layer")

If we could capture them both via the following this would be a big win but it still doesn't include the cursor 😢

vhs.Canvas, _ = vhs.Page.Element("canvas")

Are there points here that may be better discussed on the go-rod/rod issues instead? I am at a loss how to move forward.

maaslalani commented 6 months ago

@mikelorant streaming the canvases was actually a performance improvement, we originally performed screenshots and that was much slower:

https://github.com/charmbracelet/vhs/pull/21

maaslalani commented 6 months ago

For what it's worth, I have some ideas about how to get much better performance. It involves some of the work done for freeze.

Instead of using go-rod, we can simply capture ANSI sequences as output and store them as (text) "frames" and then after convert each of the frames to SVGs, then convert the SVGs into PNGs and combine them into a GIF.

mikelorant commented 6 months ago

Instead of using go-rod, we can simply capture ANSI sequences as output and store them as (text) "frames" and then after convert each of the frames to SVGs, then convert the SVGs, into PNGs and combine them into a GIF.

Agreed, go-rod isn't suited for this at all.

I do have a working branch which can handle dropped frames and stitches things together properly. But the blocking issues with receiving frames and sending events is an architecture issue.

mikelorant commented 6 months ago

I do have a bit of capacity now to look into how you did this with freeze and see if we can incorporate the solution here. You willing to do some experiments as well?

maaslalani commented 6 months ago

Yeah I played around a bit with running VHS scripts using tmux and then using tmux capture-pane -pet to grab the ANSI and then using the code here:

https://github.com/charmbracelet/freeze/blob/main/ansi.go

to turn the captured pane into an SVG. Happy to experiment with alternative solutions. I think using a PTY (https://github.com/creack/pty/tree/master) is the way to go rather than tmux for more programmatic control

mikelorant commented 6 months ago

Using pty means we have no external application dependencies? That would be a really big win and remove something I felt negatively impacted VHS.

maaslalani commented 6 months ago

Using pty means we have no external application dependencies? That would be a really big win and remove something I felt negatively impacted VHS.

Yup, other than ffmpeg of course!

maaslalani commented 6 months ago

This one is cross platform (works on Windows) and written by our very own @aymanbagabas: https://github.com/aymanbagabas/go-pty

mikelorant commented 5 months ago

@maaslalani I think I have found what we need to make this work. We need two components:

Pseudo TTY to handle the execution of commands.
Virtual terminal to render the terminal state.

For the Pseudo TTY we have the two recommendations you mentioned earlier, this part isn't a problem.

The virtual terminal has been tricky to find because all the best implementations seem to be Rust based. We effectively want a headless terminal.

Thankfully, I think I found something that fits our needs and is written in Go.

Midterm is a virtual terminal emulator. There is no GUI, but it has conveniences for rendering back to a terminal or to HTML.

Would be interested if you think this would be a viable solution. I plan to do some experiments to see if this can do the job. The author (@vito) is someone I rate very highly as he was one of the main developers for Concourse CI and is now part of the Dagger team.

maaslalani commented 5 months ago

@maaslalani I think I have found what we need to make this work. We need two components:

Pseudo TTY to handle the execution of commands.

Virtual terminal to render the terminal state.

For the Pseudo TTY we have the two recommendations you mentioned earlier, this part isn't a problem.

The virtual terminal has been tricky to find because all the best implementations seem to be Rust based. We effectively want a headless terminal.

Thankfully, I think I found something that fits our needs and is written in Go.

Midterm is a virtual terminal emulator. There is no GUI, but it has conveniences for rendering back to a terminal or to HTML.

Would be interested if you think this would be a viable solution. I plan to do some experiments to see if this can do the job. The author (@vito) is someone I rate very highly as he was one of the main developers for Concourse CI and is now part of the Dagger team.

Hey @mikelorant, yes I believe you are correct, we essentially need a headless terminal.

I would be happy with that solution but I think there's a way to do it by rendering SVG (using freeze code).

You would execute the commands in a PTY / Headless Terminal and then every frame capture the ANSI (essentially screenshot the terminal state), now that we have all the frames, we can render each to an SVG then combine those to a GIF. Does that align with your thinking? I don't mind if we do it with midterm, so long as everything works correctly.

I do think your approach makes sense as well.

robinovitch61 commented 2 months ago

This is still making vhs more frustrating than it should be to use for me :/. I spend a decent amount of time adjusting the frame rate, dimensions, and sleep times to get an output that looks reasonable.

chasewilson commented 2 weeks ago

Would love to see a resolution here. Glad that there are work arounds. Still having same issues on latest OS versions and VHS.

vito commented 8 hours ago

Thankfully, I think I found something that fits our needs and is written in Go.

Midterm is a virtual terminal emulator. There is no GUI, but it has conveniences for rendering back to a terminal or to HTML.

Would be interested if you think this would be a viable solution. I plan to do some experiments to see if this can do the job. The author (@vito) is someone I rate very highly as he was one of the main developers for Concourse CI and is now part of the Dagger team.

Thanks for the kind words 🥲

For what it's worth I'd be happy to help if using Midterm seems like a viable path to a fix. This issue affects me too, and Midterm is pretty fun to work on, so if y'all run into issues I'll try my best to address them.

charmbracelet / vhs

Example outputs at double speed #88

Why

Improvements

Solution