ServiceNow / TapeAgents

TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle
https://www.servicenow.com/research/TapeAgentsFramework.pdf
Apache License 2.0
124 stars 11 forks source link

explore options to support video (mostly youtube) for gaia #100

Open ollmer opened 2 days ago

ollmer commented 2 days ago
### Tasks
- [ ] ability to make a screenshot of a video, given a youtube url and timestamp (amount of second from the beginning of the video)
- [ ] ability to extract into separate mp3 file a chunk of sound around (-5 secs; +5secs) of the given timestamp
- [ ] find a way to get the textual transcript of the video, preferrably with the time codes
- [ ] incorporate all of that into a new method in gaia environment, which will be called by the new action WatchVideo with the params being url and optional timestamp

I see 2 possible ways of doing it (but maybe there is more, please propose!):

There are 14 tasks around video in validation set, if you want to take a closer look at them: https://huggingface.co/datasets/gaia-benchmark/GAIA/viewer/2023_all/validation?q=video