google-research / scenic

Scenic: A Jax Library for Computer Vision Research and Beyond
Apache License 2.0
3.32k stars 436 forks source link

Request for access to Vid2Seq inference code for educational video captioning #857

Closed ChukwumaChukwuma closed 1 year ago

ChukwumaChukwuma commented 1 year ago

Hi @xingyizhou,@a-nagrani and @antoyang,

I'm writing to you because I'm interested in using the Vid2Seq model for dense captioning and video captioning on a few educational videos which are MP4 files. I saw that you mentioned in Issue #817 that the inference code for the vid2seq model is still being prepared, but I'm wondering if there's any way to get access to it sooner. I'm happy to help with testing or debugging if needed.

Here are some details about the videos I'd like to caption:

They are all short, educational videos (less than 10 minutes each) They are in MP4 format They contain a variety of objects and scenes

I'm hoping to use the Vid2Seq model to generate captions that describe the main objects and events in each video. I think this would be a valuable resource for students and learners of all ages.

I understand that you're busy, but I would really appreciate it if you could take a look at my request.

Thank you for your time and consideration.

Best, Chukwuma

antoyang commented 1 year ago

Hi, you may have a look at the PyTorch Vid2Seq implementation (with a few differences explained in the readme) that is included here: https://github.com/antoyang/VidChapters. It also includes an example of an inference script.

evantkchong commented 1 year ago

Thanks @antoyang for your PyTorch implementation! I am not as familiar with JAX so this is immensely helpful!

ChukwumaChukwuma commented 1 year ago

Thank you very much. I would check it out right away.