I can input video in the Hugging face demo, but I can't find any relevant video data processing in the code. are you only sampling 4 frames of video in the front end and inputting them into the model as images?This is very important to me, please let me know, thanks!
Yes, currently we treat video as consecutive pictures as used them as input. And we will continue train on video related datasets in the future. Stay tuned.
I can input video in the Hugging face demo, but I can't find any relevant video data processing in the code. are you only sampling 4 frames of video in the front end and inputting them into the model as images?This is very important to me, please let me know, thanks!