HomebrewNLP / AGI-at-Home

AGI@Home
https://discord.gg/WJHq9cM2tS
7 stars 0 forks source link

Align video recordings with actions #1

Open ClashLuke opened 2 years ago

ClashLuke commented 2 years ago

Unfortunately, we can't take one screenshot for every action, as screenshots take 100ms or more. However, recording an entire screen at 60 FPS (the maximum framerate most modern monitors support) is possible. If we later align those frames with the actions taken during a post-processing step, we arrive at roughly the same output without the massive latency overhead. This way, we can retain the capability of allowing the model to "see" what's on the screen.

rokosbasilisk commented 2 years ago

i think this would be solved if we use ttyrec instead of videos.

ClashLuke commented 2 years ago

How would you apply ttyrec on a regular desktop? Can it handle video games such as Overwatch?

rokosbasilisk commented 2 years ago

i did not get the question about "regular desktop" (is it about apps which are not terminal based?) For most apps it should be possible to scale down the frames and convert to terminal-level graphics and extend ttyrecorder to save them , (ofc, with certain loss of quality).

Also at higher frame rates it would be really hard to collect/align the actions, it would be more important to think of a way to handle them from the model-side (eg, think of how CTC loss aligns characters at each timestep).

ClashLuke commented 2 years ago

No, scaling down the frames is not possible. If you want to try it out, take a screenshot of this page and downscale it by a factor of 2.

Vbansal21 commented 2 years ago

@ClashLuke I'd recommend to read 'Grandmaster level in StarCraft-II using multi agent reinforcement learning'. Here's the link

It has all that you'd need, real time inference with visual input using architecture consisting of tranformers, etc.