### Tasks
- [ ] ability to make a screenshot of a video, given a youtube url and timestamp (amount of second from the beginning of the video)
- [ ] ability to extract into separate mp3 file a chunk of sound around (-5 secs; +5secs) of the given timestamp
- [ ] find a way to get the textual transcript of the video, preferrably with the time codes
- [ ] incorporate all of that into a new method in gaia environment, which will be called by the new action WatchVideo with the params being url and optional timestamp
I see 2 possible ways of doing it (but maybe there is more, please propose!):
find a way to download YouTube videos into a local file and then extract screenshots and sound from them using ffmpeg cli.
use Playwright browser (or browsergym browser itself) to open the YouTube page and manipulate video controls to get to the desired timestamp, then make a screenshot.
I see 2 possible ways of doing it (but maybe there is more, please propose!):
There are 14 tasks around video in validation set, if you want to take a closer look at them: https://huggingface.co/datasets/gaia-benchmark/GAIA/viewer/2023_all/validation?q=video