Ravi-Teja-konda / Surveillance_Video_Summarizer

VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for querying and analyzing video footage.
93 stars 9 forks source link

Can you please provide the fine tuning reference code ? #1

Closed bhargavvmukti closed 2 months ago

bhargavvmukti commented 2 months ago

Hi @Ravi-Teja-konda : This is awesome work ! I am presently interested in fine tuning Florence-2 for my use case which aligns exactly with the work that you have done. Can you please add the training and validation codes to the repository ?

Ravi-Teja-konda commented 2 months ago

Hello @bhargavvmukti,

Thank you for your interest!

I’ve fine-tuned Florence-2 around its release. However, due to some other commitments, I won’t be able to release the training code at this moment.

In the meantime, feel free to check out this open-source Colab for reference:

How to Fine-Tune Florence-2 on a Detection Dataset

bhargavvmukti commented 2 months ago

Thanks @Ravi-Teja-konda for the inputs :

If not the code then path forward if that can be shared would highly appreciate that !

This is for object detection ,for me the use case is video activity recognition like walking , running , sleeping likewise on the surveillance data.

Ravi-Teja-konda commented 2 months ago

Sure , we can discuss if needed