Open robinovitch61 opened 2 years ago
I can't reproduce this on my machine, but I think it has something to do with the number of frames that the VHS instance is able to capture. Can you try setting the framerate to 24 (the default is 60)
Set Framerate 24
To explain this bug further, VHS will try to capture a frame every 1/framerate seconds (by default this is every 16.7 milliseconds) however if the capturing process takes longer, let's say 30 milliseconds per frame then VHS won't have enough frames to render the GIF at 60 FPS, but it will still try to. So it assumes that we are using 60 FPS when we really only have 30 FPS and results in a sped up GIF.
One solution is to time how long the frame capture takes (on average) and then use that as the actual frame rate so that the GIF isn't sped up.
That looks better!
# Where should we write the GIF?
Output demo.gif
# Set up a 1200x600 terminal with 46px font.
Set FontSize 46
Set Width 1200
Set Height 600
Set Framerate 24
# Type a command in the terminal.
Type "echo 'Welcome to VHS!'"
# Pause for dramatic effect...
Sleep 500ms
# Run the command by pressing enter.
Enter
# Admire the output for a bit.
Sleep 5s
Results in
Any ideas why it might be slower to capture a frame? Here are some system/environment stats
❯ ffmpeg -version
ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.102)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
❯ ttyd --version
ttyd version 1.7.2-e8728bb
❯ vhs -v
vhs version v0.1.1
One solution is to time how long the frame capture takes (on average) and then use that as the actual frame rate so that the GIF isn't sped up.
Would I do this by counting the number of frames and dividing by the gif length or something?
I have no idea why frames would be capturing slowly on that machine, it seems to be very fast. Do you have a lot of applications open like chrome, slack, VS Code etc... while recording the GIF?
Would I do this by counting the number of frames and dividing by the gif length or something?
We have some logic to see how long a frame capture took so that we can sleep for the rest of the time:
https://github.com/charmbracelet/vhs/blob/1508f495c2016d7364244924fc4123b70c13f55b/vhs.go#L199
So we would probably measure each frame and average the number of frames we captured each second.
I have no idea why frames would be capturing slowly on that machine, it seems to be very fast. Do you have a lot of applications open like chrome, slack, VS Code etc... while recording the GIF?
I quit everything so activity monitor looked like this
During vhs < demo.tape
running, it got up to this
But still doesn't look overloaded! Very strange.
FWIW the dockerized run works for me, so that can be my workaround for now :)
docker run --rm -v $PWD:/vhs ghcr.io/charmbracelet/vhs demo.tape
Interesting, so in docker you're able to get 60 frames per second? That's definitely really strange.
Dang I was hoping that was the case but just got this on my other machine too
❯ vhs < demo.tape
Output .gif demo.gif
Set FontSize 32
Set Width 1200
Set Height 600
Type echo 'Welcome to VHS!'
Sleep 500ms
Enter 1
Sleep 5s
Creating GIF...
Time: 0h:00m:16s
So might be a mac thing?
This is super strange, I will definitely look into this. I don't know if it's a Mac thing, I have a mac and things work correctly for me.
I'll try and see if there's anything funky going on. It's really strange that it's not a memory/cpu issue. But that also means it could be solvable!
Really appreciate all the info you've given, it's super helpful! I'll try and see if there's something is going wrong in VHS.
Ok thanks! Extra bits of info is that I installed vhs via brew
on the intel mac and via go
on the m2 mac. On both macs I installed the ffmpeg
and ttyd
deps via brew
as per the README. I have Brave Browser set as my system default on both (not sure it matters at all).
This might be a super long shot, but what happens if you set chrome to your default. VHS uses a chromium browser and since brave is chromium it might be using that. I have no idea though, this is a long shot.
No luck - installed latest chrome, set as default, restarted computer, same output from tape
No luck - installed latest chrome, set as default, restarted computer, same output from tape
Gotcha, it was a long shot. Really appreciate you trying it out and ruling that possibility out!
Here's another long shot: are your devices battery powered when running vhs? In other words: might another energy profile limit rendering performance of the remote controlled browser instance?
I don't know if #110 can solve it or not.
Here's another long shot: are your devices battery powered when running vhs? In other words: might another energy profile limit rendering performance of the remote controlled browser instance?
They weren't plugged in, but unfortunately same result plugged in!
I don't know if https://github.com/charmbracelet/vhs/pull/110 can solve it or not.
Thanks for the help @ysmood ! I built off https://github.com/charmbracelet/vhs/commit/e3691162b0b968726cacb6762c1f3c96ef4bf185 and tried various framerates, but unfortunately there are still differing playback speeds.
From the outputs below, speed peaks around 50fps, the default setting. If the bug is due to not being able to capture frames quickly enough, having too few total frames, and then trying to render the gif at the assumed framerate resulting in a sped up gif as I think I've correctly interpreted from your comment here, then it's strange that it slows down again above 50fps, right?
Also all the output gifs are about the same size (30-32kB), which might be expected, but is also interesting.
All of them were generated like this (70fps as example)
Output examples/70.gif
Require echo
Set FontSize 32
Set Width 1200
Set Height 600
Set Framerate 70
Type "echo 'Welcome to VHS!' 70fps" Sleep 500ms Enter
Sleep 5s
I have the same problem on Arch Linux (with sway). Also on a relatively fast system (Intel i7-1065G7).
Revisiting this - after upgrading to MacOS Ventura, things look good!
# Where should we write the GIF?
Output demo.gif
# Set up a 1200x600 terminal with 46px font.
Set FontSize 46
Set Width 1200
Set Height 600
# Type a command in the terminal.
Type "echo 'Welcome to VHS!'"
# Pause for dramatic effect...
Sleep 500ms
# Run the command by pressing enter.
Enter
# Admire the output for a bit.
Sleep 5s
Gives
Same versions as before
❯ ffmpeg -version
ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.102)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
❯ ttyd -version
ttyd version 1.7.2-e8728bb
❯ vhs -v
vhs version 0.1.1
Ah, nevermind, when I increase the width and height of the terminal, it gets much worse. Even changing the framerate doesn't help then.
I'm seeing this same thing in my environment. which like robinovitch61 said, seems to be exacerbated with the output size.
this is set to 14s sleep but you can maven reports less than 8 seconds for compile time.
I'm running on a fedora37 VM with kitty - though I'm ssh'ing into the terminal from a windows box if that matters. My ttyd install was done through brew.
VM Environment | |
Software | Version |
---|---|
OS | Fedora 37 |
ttyd (installed via brew) | 1.7.3 |
ffmpeg (installed via dnf) | ```sh libavutil 57. 28.100 / 57. 28.100 libavcodec 59. 37.100 / 59. 37.100 libavformat 59. 27.100 / 59. 27.100 libavdevice 59. 7.100 / 59. 7.100 libavfilter 8. 44.100 / 8. 44.100 libswscale 6. 7.100 / 6. 7.100 libswresample 4. 7.100 / 4. 7.100 libpostproc 56. 6.100 / 56. 6.100 ``` |
vhs (installed via yum repo) | version v0.1.0 (d6bba9f) |
ffmpeg version 5.1.3 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12 (GCC)
configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --docdir=/usr/share/doc/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=x86_64 --optflags='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' --extra-ldflags='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --enable-version3 --enable-libsmbclient --disable-openssl --enable-bzlib --enable-frei0r --enable-chromaprint --enable-gcrypt --enable-gnutls --enable-ladspa --enable-lcms2 --enable-libshaderc --enable-vulkan --disable-cuda-sdk --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdc1394 --enable-libdrm --enable-libfdk-aac --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libiec61883 --enable-libilbc --enable-libjack --enable-libjxl --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopenh264-dlopen --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libplacebo --enable-libpulse --enable-librabbitmq --enable-librav1e --enable-librist --enable-librsvg --enable-librubberband --enable-libsnappy --enable-libsvtav1 --enable-libsoxr --enable-libspeex --enable-libssh --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libv4l2 --enable-libvpx --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lto --enable-libmfx --enable-lv2 --enable-vaapi --enable-vdpau --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libxvid --enable-openal --enable-opencl --enable-opengl --enable-pthreads --enable-vapoursynth --enable-muxers --enable-demuxers --enable-hwaccels --disable-encoders --disable-decoders --disable-decoder='h264,hevc,vc1' --enable-encoder=',a64multi,a64multi5,aac,libfdk_aac,ac3,adpcm_adx,adpcm_argo,adpcm_g722,adpcm_g726,adpcm_g726le,adpcm_ima_alp,adpcm_ima_amv,adpcm_ima_apm,adpcm_ima_qt,adpcm_ima_ssi,adpcm_ima_wav,adpcm_ima_ws,adpcm_ms,adpcm_swf,adpcm_yamaha,alac,alias_pix,amv,apng,ass,asv1,asv2,ayuv,bitpacked,bmp,cinepak,cljr,dca,dfpwm,dnxhd,dpx,dvbsub,dvdsub,dvvideo,exr,ffv1,ffvhuff,flac,flashsv,flashsv2,flv,g723_1,gif,h261,h263,h263_v4l2m2m,h263p,h264_amf,h264_nvenc,h264_qsv,h264_v4l2m2m,h264_vaapi,hap,hevc_amf,hevc_nvenc,hevc_qsv,hevc_v4l2m2m,hevc_vaapi,huffyuv,ilbc,jpegls,jpeg2000,libaom,libaom_av1,libcodec2,libgsm,libgsm_ms,libilbc,libjxl,libmp3lame,libopencore_amrnb,libopenh264,libopenjpeg,libopus,librav1e,libschroedinger,libspeex,libsvtav1,libtheora,libtwolame,libvo_amrwbenc,libvorbis,libvpx_vp8,libvpx_vp9,libwebp,libwebp_anim,libxvid,mjpeg,mjpeg_qsv,mjpeg_vaapi,mlp,mp2,mp2fixed,mpeg1video,mpeg2video,mpeg2_qsv,mpeg2_vaapi,mpeg4,mpeg4_v4l2m2m,msmpeg4v2,msmpeg4v3,msvideo1,nellymoser,opus,pam,pbm,pcm_alaw,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s8,pcm_s8_planar,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcx,pgm,pgmyuv,phm,png,ppm,qoi,qtrle,r10k,r210,ra_144,rawvideo,roq,roq_dpcm,rpza,rv10,rv20,s302m,sbc,sgi,smc,snow,sonic,sonic_ls,speedhq,srt,ssa,subrip,sunrast,svq1,targa,text,tiff,truehd,tta,ttml,utvideo,v210,v308,v408,v410,vc1_qsv,vc1_v4l2m2m,vc2,vorbis,vp8_qsv,vp8_v4l2m2m,vp8_vaapi,vp9_qsv,vp9_vaapi,wavpack,webvtt,wmav1,wmav2,wmv1,wmv2,wrapped_avframe,xbm,xface,xsub,xwd,y41p,yuv4,zlib,zmbv,' --enable-decoder=',aac,aasc,libfdk_aac,ac3,acelp_kelvin,adpcm_4xm,adpcm_adx,adpcm_afc,adpcm_agm,adpcm_aica,adpcm_argo,adpcm_ct,adpcm_dtk,adpcm_ea,adpcm_ea_maxis_xa,adpcm_ea_r1,adpcm_ea_r2,adpcm_ea_r3,adpcm_ea_xas,adpcm_g722,adpcm_g726,adpcm_g726le,adpcm_ima_acorn,adpcm_ima_alp,adpcm_ima_amv,adpcm_ima_apc,adpcm_ima_apm,adpcm_ima_cunning,adpcm_ima_dat4,adpcm_ima_dk3,adpcm_ima_dk4,adpcm_ima_ea_eacs,adpcm_ima_ea_sead,adpcm_ima_iss,adpcm_ima_moflex,adpcm_ima_mtf,adpcm_ima_oki,adpcm_ima_qt,adpcm_ima_qt_at,adpcm_ima_rad,adpcm_ima_smjpeg,adpcm_ima_ssi,adpcm_ima_wav,adpcm_ima_ws,adpcm_ms,adpcm_mtaf,adpcm_psx,adpcm_sbpro_2,adpcm_sbpro_3,adpcm_sbpro_4,adpcm_swf,adpcm_thp,adpcm_thp_le,adpcm_vima,adpcm_xa,adpcm_yamaha,adpcm_zork,alac,alias_pix,amrnb,amrwb,amv,anm,ansi,ape,apng,arbc,argo,ass,asv1,asv2,atrac1,atrac3,atrac3al,atrac3p,atrac3pal,aura,aura2,av1,av1_qsv,ayuv,bethsoftvid,bfi,bink,binkaudio_dct,binkaudio_rdft,bintext,bitpacked,bmp,bmv_audio,bmv_video,brender_pix,c93,ccaption,cdgraphics,cdtoons,cdxl,cinepak,clearvideo,cljr,cook,cpia,cscd,cyuv,dca,dds,derf_dpcm,dfa,dfpwm,dirac,dnxhd,dolby_e,dpx,dsd_lsbf,dsd_msbf,dsicinaudio,dsicinvideo,dss_sp,dvaudio,dvbsub,dvdsub,dvvideo,dxa,dxtory,eacmv,eamad,eatgq,eatgv,eatqi,eightbps,eightsvx_exp,eightsvx_fib,escape124,escape130,evrc,exr,ffv1,ffvhuff,ffwavesynth,fits,flac,flashsv,flashsv2,flic,flv,fmvc,fourxm,g723_1,g729,gdv,gem,gif,gremlin_dpcm,gsm,gsm_ms,gsm_ms_at,h261,h263,h263_v4l2m2m,h263i,h263p,hap,hca,hcom,hnm4_video,hq_hqa,hqx,huffyuv,hymt,iac,idcin,idf,iff_ilbm,ilbc,imc,indeo2,indeo3,indeo4,indeo5,interplay_acm,interplay_dpcm,interplay_video,ipu,jacosub,jpeg2000,jpegls,jv,kgv1,kmvc,lagarith,libaom,libaom_av1,libcodec2,libdav1d,libgsm,libgsm_ms,libilbc,libjxl,libopencore_amrnb,libopencore_amrwb,libopenh264,libopenjpeg,libopus,librsvg,libschroedinger,libspeex,libvorbis,libvpx_vp8,libvpx_vp9,libzvbi_teletext,loco,lscr,m101,mace3,mace6,mdec,metasound,microdvd,mimic,mjpeg,mjpeg_qsv,mjpegb,mlp,mmvideo,motionpixels,mp1,mp1float,mp2,mp2float,mp3,mp3adu,mp3adufloat,mp3float,mp3on4,mp3on4float,mpc7,mpc8,mpeg1video,mpeg1_v4l2m2m,mpeg2video,mpeg2_qsv,mpeg2_v4l2m2m,mpeg4,mpeg4_v4l2m2m,mpegvideo,mpl2,msa1,mscc,msmpeg4v1,msmpeg4v2,msmpeg4v3,msnsiren,msp2,msrle,mss1,mss2,msvideo1,mszh,mts2,mv30,mvc1,mvc2,mvdv,mvha,mwsc,mxpeg,nellymoser,nuv,on2avc,opus,paf_audio,paf_video,pam,pbm,pcm_alaw,pcm_bluray,pcm_dvd,pcm_f16le,pcm_f24le,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_lxf,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24daud,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s64be,pcm_s64le,pcm_s8,pcm_s8_planar,pcm_sga,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcm_vidc,pcx,pfm,pgm,pgmyuv,pgssub,pgx,phm,photocd,pictor,pjs,png,ppm,prosumer,psd,ptx,qcelp,qdm2,qdmc,qdraw,qoi,qpeg,qtrle,r10k,r210,ra_144,ra_288,rasc,rawvideo,realtext,rl2,roq,roq_dpcm,rpza,rscc,rv10,rv20,s302m,sami,sanm,sbc,screenpresso,sdx2_dpcm,sgi,sgirle,shorten,simbiosis_imx,sipr,siren,smackaud,smacker,smc,smvjpeg,snow,sol_dpcm,sonic,sp5x,speedhq,speex,srgc,srt,ssa,stl,subrip,subviewer,subviewer1,sunrast,svq1,svq3,tak,targa,targa_y216,tdsc,text,theora,thp,tiertexseqvideo,tiff,tmv,truehd,truemotion1,truemotion2,truemotion2rt,truespeech,tscc,tscc2,tta,twinvq,txd,ulti,utvideo,v210,v210x,v308,v408,v410,vb,vble,vcr1,vmdaudio,vmdvideo,vmnc,vorbis,vp3,vp4,vp5,vp6,vp6a,vp6f,vp7,vp8,vp8_qsv,vp8_v4l2m2m,vp9,vp9_qsv,vp9_v4l2m2m,vplayer,vqa,wavpack,wcmv,webp,webvtt,wmav1,wmav2,wmavoice,wmv1,wmv2,wnv1,wrapped_avframe,ws_snd1,xan_dpcm,xan_wc3,xan_wc4,xbin,xbm,xface,xl,xpm,xsub,xwd,y41p,ylc,yop,yuv4,zero12v,zerocodec,zlib,zmbv,'
After a weekend of working on this problem, I have a solution to this issue.
I firstly need to explain a few things and why this has been such a difficult problem to diagnose.
The default frame rate configured in VHS is 50
. This is already a problematic value because to maintain this frame rate we need to be capturing frames in 1s
(1000ms
) / 50
frames = 20ms
. So we fire off requests to go-rod
to request the browser to send image data. We unfortunately have to do this twice - once for the text, a second time for the cursor. Then we need to write these images. This is a total of 4 operations that need to be completed in 20ms
. We have a budget of 10ms
to read the text and write, and 10ms
to read the cursor and write.
It gets even worse, these requests are blocking calls. We can't send key press requests while we are waiting for the images. They are queued up.
So what happens if we can't do these 4 actions in 20ms
? We take as long as needed and then immediately attempt to get the next set of frames. No downtime here! However, now we are a delayed frame. We didn't trigger 20ms
after the last one, we may have triggered 40ms
instead. So we are in some ways dropping frames, we collect less frames than we expect.
When it comes to assembling these frames in a video, it is simply joining them all together and saying each frame is 20ms
apart or 50fps
. But hold on... we dropped frames? If we were meant to collect 400
frames, but only got 200
frames, we have a problem. We send all this to ffmpeg
and tell it to assemble them at 50fps
but we only supply half the frames it needs. It assume the video is shorter and therefore it ends up playing back twice as fast because it had it set to 50fps
.
There are lots of little improvements we can do to improve this situation. This is what I've done and experimented with:
24
. Half the frame rate = twice the budget to collect frames. Easy win and for capturing text it seems unnecessary to record faster.4ms
for 2 files and any weird disk issues don't impact the next capture event.go-rod
page screenshot function. Sure, the cursor might not take that long, but every bit counts. So far working out well. Some complexities here, so not a free win.Here is an output of the demo.tape
with some metrics so you can clearly see the problem.
❯ go run . examples/demo.tape
File: examples/demo.tape
Host your GIF on vhs.charm.sh: vhs publish <file>.gif
Output .gif examples/demo.gif
Require echo
Set Shell bash
Set FontSize 32
Set Width 1200
Set Height 600
Set Framerate 100
Type echo 'Welcome to VHS!'
WARN: Exceed Budget, Duration: 38ms (Get text) [0]
WARN: Delayed Next, Delay: 28ms [0]
WARN: Exceed Budget, Duration: 20ms (Get text) [1]
WARN: Delayed Next, Delay: 10ms [1]
...
WARN: Exceed Budget, Duration: 10ms (Get text) [99]
WARN: Delayed Next, Delay: 0ms [99]
Sleep 500ms
Enter 1
WARN: Exceed Budget, Duration: 11ms (Get text) [157]
WARN: Delayed Next, Delay: 1ms [157]
Sleep 5s
WARN: Exceed Budget, Duration: 10ms (Get text) [599]
WARN: Delayed Next, Delay: 0ms [599]
WARN: Exceed Budget, Duration: 11ms (Get text) [605]
WARN: Delayed Next, Delay: 1ms [605]
WARN: Exceed Budget, Duration: 10ms (Get text) [608]
WARN: Delayed Next, Delay: 0ms [608]
Delayed Next Frames: 25, Total Frames: 638 Expected Frames: 683
Exceeded Budget Events: 25, Total Events: 2552, Expected Events: 2732
This is a recording at 100fps
and what the exceed budget tells me is that I was 11ms
too slow to get the text image. I only have a budget of 10ms
but I took 21ms
. Now I forced the next frame to be taken 11ms
late. I track how often these exceed budgets occur. They aren't the main concern though, it is only when we have delayed frames does this turn into a problem.
I also know I have less frames than expected (likely around 50 but it doesn't add up exactly).
As you can see having metrics helps considerably. Running this in a container shows how poor performing Chrome is when using software rendering instead of hardware accelerated (250ms
to get an image).
Overall, these improvements helped but still did not address the root cause of the problem.
The proper solution to this issue is to stop assuming the frames are received at their expected interval and start tracking exactly when a frame is captured. ffmpeg
has a feature using the concat
filter to join images together based on a text file that contains an image filename and a timestamp offset. The offset means we no longer need ffmpeg
to assume that the frames are equally spaced, it knows exactly when each frame should be displayed.
Here is a sample of what this looks like:
file 'frame-text-00000.png'
outpoint 00:00:00.019
file 'frame-text-00001.png'
outpoint 00:00:00.049
file 'frame-text-00002.png'
outpoint 00:00:00.047
file 'frame-text-00003.png'
outpoint 00:00:00.033
file 'frame-text-00004.png'
outpoint 00:00:00.032
file 'frame-text-00005.png'
outpoint 00:00:00.032
The outpoint
defines how long after displaying the previous image to move onto the next image. These offsets will become further apart as there are less frames.
Be aware, we can't magically make it that the frame rate you request will have all frames captured. That comes down to the performance of the hardware used for recording. However, we can tell the user that they are dropping frames. This knowledge will allow them to tune their tape settings to match their hardware capabilities. Assembling the frames we do have using the offset will mask this problem and prevent the recording from being sped up. This give us a consistent speed video and allows us to solve issues like these.
@maaslalani Will take me a few days to begin to put up the pull requests that address the improvements I mentioned. Still also need to figure out how to unit test this as well.
@mikelorant thanks for this excellent analysis! Those improvements in performance and visibility sound great
Addressing a few different comments with this thread:
One solution is to time how long the frame capture takes (on average) and then use that as the actual frame rate so that the GIF isn't sped up.
This would never work because not all parts of the recording capture at the same frame rate. Depending on complexity of the image, the rendering time is significantly different. A mostly blank screen with a few lines of text encodes far quicker than a very busy screen filled with many colours and characters.
Interesting, so in docker you're able to get 60 frames per second? That's definitely really strange.
This is because there is a switch from Chrome using a hardware renderer to a software rendering. We switching from GPU to CPU. In general software rendering is terrible in containers and we should highly discourage it. However in some cases it can improve performance if the hardware GPU is busy or has weak performance.
then it's strange that it slows down again above 50fps
This one took me a while to really understand.
Grabbing images is done by using the go-rod method CanvasToImage
. This maps back to a browser function toDataUrl
. This function is problematic and an improved function toBlob
now exists because of some of the issues. The main issue that matters to us is that this is a blocking function. While we are requesting an image we can't do anything else. This includes being unable to send key presses. We have a few image requests queued up and we want to send a key press, it will have to wait. Now we have slowed down our ability to send keys when we want them causing the video to be "slow". Very obvious when sending lots of typing text.
I've been experimenting with some ideas to solve this problem, but ttyd is making this difficult because while I can open multiple tabs in our headless Chrome, I can't open multiple pages that map back to the same ttyd session. Having multiple tabs to the same ttyd session would allow me to send key presses to one tab and record images from the other. I'm probably going to experiment with the idea of using a shared tmux session between multiple tabs and see if it works well.
Ah, nevermind, when I increase the width and height of the terminal, it gets much worse. Even changing the framerate doesn't help then.
Making the terminal larger just increases the size of the image and increases the latency for image generation. Lowering the frame rate will help but in most cases you need it to drop to single digits which really doesn't look great. What matters most is finding out how long it takes it get and write a complex image from the terminal. How many milliseconds that takes is what will determine what a reasonable frame rate can be set.
@ysmood Having some challenges making some improvements especially around trying to reduce the cost of capturing the frames.
The current way of taking a screenshot, which has to be done for both the text and cursor is:
text, textErr := vhs.TextCanvas.CanvasToImage("image/png", quality)
As I understand it, this internally translates in rod
to sending toDataUrl
which sends back the png output.
Instead of capturing individually the two canvas elements of xterm.js, it seems better to do the following:
req := proto.PageCaptureScreenshot{
Format: proto.PageCaptureScreenshotFormatPng,
OptimizeForSpeed: true,
}
text, textErr := vhs.Page.Screenshot(true, &req)
This would allow us to get both the text and cursor combined. However this one method takes 3x longer than doing a single CanvasToImage
. This seems to translate to a message via CDP as Page.captureScreenshot
. I am guessing this may not be hardware accelerated compared with asking Chrome directly for CanvasToImage
.
I'll quickly mention, I know OptimizeForSpeed
does nothing yet. Hopefully it will though 😄
Do you know why the performance is significantly less? Are there any other options to capturing the screen in one go?
Currently we define the two canvases as:
vhs.TextCanvas, _ = vhs.Page.Element("canvas.xterm-text-layer")
vhs.CursorCanvas, _ = vhs.Page.Element("canvas.xterm-cursor-layer")
If we could capture them both via the following this would be a big win but it still doesn't include the cursor 😢
vhs.Canvas, _ = vhs.Page.Element("canvas")
Are there points here that may be better discussed on the go-rod/rod
issues instead? I am at a loss how to move forward.
@mikelorant streaming the canvases was actually a performance improvement, we originally performed screenshots and that was much slower:
For what it's worth, I have some ideas about how to get much better performance. It involves some of the work done for freeze
.
Instead of using go-rod
, we can simply capture ANSI sequences as output and store them as (text) "frames" and then after convert each of the frames to SVGs, then convert the SVGs into PNGs and combine them into a GIF.
Instead of using
go-rod
, we can simply capture ANSI sequences as output and store them as (text) "frames" and then after convert each of the frames to SVGs, then convert the SVGs, into PNGs and combine them into a GIF.
Agreed, go-rod
isn't suited for this at all.
I do have a working branch which can handle dropped frames and stitches things together properly. But the blocking issues with receiving frames and sending events is an architecture issue.
I do have a bit of capacity now to look into how you did this with freeze
and see if we can incorporate the solution here. You willing to do some experiments as well?
Yeah I played around a bit with running VHS scripts using tmux
and then using tmux capture-pane -pet
to grab the ANSI and then using the code here:
to turn the captured pane into an SVG. Happy to experiment with alternative solutions. I think using a PTY (https://github.com/creack/pty/tree/master) is the way to go rather than tmux for more programmatic control
Using pty
means we have no external application dependencies? That would be a really big win and remove something I felt negatively impacted VHS.
Using
pty
means we have no external application dependencies? That would be a really big win and remove something I felt negatively impacted VHS.
Yup, other than ffmpeg
of course!
This one is cross platform (works on Windows) and written by our very own @aymanbagabas: https://github.com/aymanbagabas/go-pty
@maaslalani I think I have found what we need to make this work. We need two components:
For the Pseudo TTY we have the two recommendations you mentioned earlier, this part isn't a problem.
The virtual terminal has been tricky to find because all the best implementations seem to be Rust based. We effectively want a headless terminal.
Thankfully, I think I found something that fits our needs and is written in Go.
Midterm is a virtual terminal emulator. There is no GUI, but it has conveniences for rendering back to a terminal or to HTML.
Would be interested if you think this would be a viable solution. I plan to do some experiments to see if this can do the job. The author (@vito) is someone I rate very highly as he was one of the main developers for Concourse CI and is now part of the Dagger team.
@maaslalani I think I have found what we need to make this work. We need two components:
Pseudo TTY to handle the execution of commands.
Virtual terminal to render the terminal state.
For the Pseudo TTY we have the two recommendations you mentioned earlier, this part isn't a problem.
The virtual terminal has been tricky to find because all the best implementations seem to be Rust based. We effectively want a headless terminal.
Thankfully, I think I found something that fits our needs and is written in Go.
Midterm is a virtual terminal emulator. There is no GUI, but it has conveniences for rendering back to a terminal or to HTML.
Would be interested if you think this would be a viable solution. I plan to do some experiments to see if this can do the job. The author (@vito) is someone I rate very highly as he was one of the main developers for Concourse CI and is now part of the Dagger team.
Hey @mikelorant, yes I believe you are correct, we essentially need a headless terminal.
I would be happy with that solution but I think there's a way to do it by rendering SVG (using freeze code).
You would execute the commands in a PTY / Headless Terminal and then every frame capture the ANSI (essentially screenshot the terminal state), now that we have all the frames, we can render each to an SVG then combine those to a GIF. Does that align with your thinking? I don't mind if we do it with midterm, so long as everything works correctly.
I do think your approach makes sense as well.
This is still making vhs
more frustrating than it should be to use for me :/. I spend a decent amount of time adjusting the frame rate, dimensions, and sleep times to get an output that looks reasonable.
Would love to see a resolution here. Glad that there are work arounds. Still having same issues on latest OS versions and VHS.
Thankfully, I think I found something that fits our needs and is written in Go.
Midterm is a virtual terminal emulator. There is no GUI, but it has conveniences for rendering back to a terminal or to HTML.
Would be interested if you think this would be a viable solution. I plan to do some experiments to see if this can do the job. The author (@vito) is someone I rate very highly as he was one of the main developers for Concourse CI and is now part of the Dagger team.
Thanks for the kind words 🥲
For what it's worth I'd be happy to help if using Midterm seems like a viable path to a fix. This issue affects me too, and Midterm is pretty fun to work on, so if y'all run into issues I'll try my best to address them.
When I run the demo tape:
I get the following, which seems to be at 2x speed compared to the example in the README: