Open ClashLuke opened 2 years ago
i think this would be solved if we use ttyrec instead of videos.
How would you apply ttyrec on a regular desktop? Can it handle video games such as Overwatch?
i did not get the question about "regular desktop" (is it about apps which are not terminal based?) For most apps it should be possible to scale down the frames and convert to terminal-level graphics and extend ttyrecorder to save them , (ofc, with certain loss of quality).
Also at higher frame rates it would be really hard to collect/align the actions, it would be more important to think of a way to handle them from the model-side (eg, think of how CTC loss aligns characters at each timestep).
No, scaling down the frames is not possible. If you want to try it out, take a screenshot of this page and downscale it by a factor of 2.
@ClashLuke I'd recommend to read 'Grandmaster level in StarCraft-II using multi agent reinforcement learning'. Here's the link
It has all that you'd need, real time inference with visual input using architecture consisting of tranformers, etc.
Unfortunately, we can't take one screenshot for every action, as screenshots take 100ms or more. However, recording an entire screen at 60 FPS (the maximum framerate most modern monitors support) is possible. If we later align those frames with the actions taken during a post-processing step, we arrive at roughly the same output without the massive latency overhead. This way, we can retain the capability of allowing the model to "see" what's on the screen.