Ravi-Teja-konda / Surveillance_Video_Summarizer

VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for querying and analyzing video footage.
93 stars 9 forks source link
ai chatgpt florence-2 gpt-4 gradio gradio-python-llm huggingface summarization surviellance video vision-and-language vlm

🎥 Surveillance Video Summarizer: AI-Powered Video Analysis and Summarization

Checked on 13.09.2024 ✅ (This project is developed tested on the Lightning AI platform, running on an L40 GPU)

Surveillance Video Summarizer is a AI-driven system that processes surveillance videos, extracts key frames, and generates detailed annotations. Powered by a fine-tuned Florence-2 Vision-Language Model (VLM) specifically trained on the SPHAR dataset, it highlights notable events, actions, and objects within video footage and logs them for easy review and further analysis.

The fine-tuned model can be found at: kndrvitja/florence-SPHAR-finetune-2.

See the tool in action below!

🎥 Demo Video

Demo Video

Features


📣 How it Works

  1. Frame Extraction:
    Frames are extracted at regular intervals from surveillance video files using OpenCV.

  2. AI-Powered Annotation:
    Each frame is analyzed by the fine-tuned Florence-2 Vision-Language Model, generating insightful annotations about the scene.

  3. Data Storage:
    Annotations and their associated frame data are stored in a SQLite database, ready for future analysis.

  4. Gradio Interface: The system allows users to effortlessly query surveillance logs by providing a specific time range and tailored prompts. It retrieves, summarizes, and analyzes the relevant video footage, offering concise insights


Installation

  1. Clone the repository:
    git clone https://github.com/Ravi-Teja-konda/Surveillance_Video_Summarizer.git
  2. Navigate to the project directory:
    cd Surveillance_Video_Summarizer
  3. Install the required Python libraries:
    pip install -r requirements.txt

    Configuration

    Model and Processor

Database Path

Usage

Firstly, run the frame extraction :

python surveillance_video_summarizer.py

Next, interact with the Gradio interface for log analysis:

python surveillance_log_analyzer_with_gradio.py

From here, you can use the Gradio interface to query specific periods of video footage and retrieve annotated summaries based on your input. You can query the system for specific actions, notable events, or general activity summaries. Provide the time range and your query prompt, and the system will return the relevant logs

🚀 Future Enhancements

Advanced Event Detection

We plan to enhance the model’s capability to detect more complex events such as traffic violations, suspicious behavior, and other nuanced surveillance scenarios by training florence-2 with more data

Real-Time Streaming

In future will plan to support real-time video streams for immediate frame extraction and analysis as the video is being captured.


Contributing

Contributions are welcome! Feel free to submit a pull request.


❤️ Support the Project

If you find this project useful, consider starring it on GitHub to help others discover it!


📚 References

Inspired by advances in Vision-Language models like Florence-2.

License

This project is licensed under the Apache License 2.0.