OpenFish is an open-source system written in GoLang for classifying marine species. Tasks involve importing video or image data, classifying and annotating data (both manually and automatically), searching, and more. It is expected that OpenFish will use utilize computer vision and machine learning techniques.
To do any sort of computer vision classification we will need to have training data consisting of our annotations and video. Video streams currently do not let us download video, only provide a URL to the youtube video.
Solution
I propose a new API that retrieves a fragment of video associated with that videostream.
GET http://openfish.appspot.com/api/v1/videostreams/1234/media?time=03:11:36-03:11:40
When preparing training data, you could easily fetch the video fragments and annotations in N+1 http requests (N=number of training data pairs). First, fetch the annotations, with whatever filter criteria you want, then for each annotation, fetch the media using the start and end times, and videostreamId.
GET http://openfish.appspot.com/api/v1/annotations?observation[common_name]=Giant Cuttlefish
GET http://openfish.appspot.com/api/v1/videostreams/<id>/media?time=<start>-<end>
Implementation
One challenge is that our videos are within youtube and are not easily downloadable. There is a go library for downloading videos: https://github.com/kkdai/youtube, however, it downloads the whole video, not a fragment. Given we are dealing with many-hour long streams, it is not efficient to download an entire video each time the API received a request, and then discarding majority of the video.
One solution would be to cache the video and then if we had a series of requests for the same videostream but different times, we would only need to download the video once.
An alternative, and maybe a better solution would be to use YT-DLP: https://github.com/yt-dlp/yt-dlp. YT-DLP is capable of only downloading small segments of video. For example: this command downloads 4 seconds of video.
YT-DLP downloads slightly longer as it needs to start at an I-frame/keyframe. --force-keyframes-at-cuts re-encodes the video using ffmpeg so that it begins with a keyframe, chopping the video back down to length.
YT-DLP is written in python, and has dependencies on binaries such as ffmpeg. This means app-engine is not viable as it is not pure-go, however google-cloud-run and a docker image may be an alternative option.
A proof of concept branch has been created - media-api. The API is noticably slow, because it downloads the whole video fragment, reencodes it, then sends it to the client.
Problem
To do any sort of computer vision classification we will need to have training data consisting of our annotations and video. Video streams currently do not let us download video, only provide a URL to the youtube video.
Solution
I propose a new API that retrieves a fragment of video associated with that videostream.
When preparing training data, you could easily fetch the video fragments and annotations in N+1 http requests (N=number of training data pairs). First, fetch the annotations, with whatever filter criteria you want, then for each annotation, fetch the media using the start and end times, and videostreamId.
Implementation
One challenge is that our videos are within youtube and are not easily downloadable. There is a go library for downloading videos: https://github.com/kkdai/youtube, however, it downloads the whole video, not a fragment. Given we are dealing with many-hour long streams, it is not efficient to download an entire video each time the API received a request, and then discarding majority of the video.
One solution would be to cache the video and then if we had a series of requests for the same videostream but different times, we would only need to download the video once.
An alternative, and maybe a better solution would be to use YT-DLP: https://github.com/yt-dlp/yt-dlp. YT-DLP is capable of only downloading small segments of video. For example: this command downloads 4 seconds of video.
YT-DLP downloads slightly longer as it needs to start at an I-frame/keyframe.
--force-keyframes-at-cuts
re-encodes the video using ffmpeg so that it begins with a keyframe, chopping the video back down to length.YT-DLP is written in python, and has dependencies on binaries such as ffmpeg. This means app-engine is not viable as it is not pure-go, however google-cloud-run and a docker image may be an alternative option.