LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Abstract: We introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the reasoning and contextual understanding capabilities of pretrained large language models to produce precise answers. It further improves by using a confidence and refinement module to provide confident answers. Our approach achieves state-of-the-art performance on the EgoSchema benchmark for question answering and is highly competitive on the natural language query (NLQ) challenge of Ego4D.
View Paper Website
![](https://github.com/Agentic-Learning-AI-Lab/lifelong-memory/blob/main/pipeline.png)
### Quick start
Captions (LaViLa on every 2s video clip + caption digest): [Google drive link](https://drive.google.com/file/d/1uNIcw0r3UnPoHQ4fJEqRHB2gUhQT4HWj/view?usp=sharing)
#### Ego4D NLQ
```
python scripts/llm_reason.py \
--task NLQ \
--annotation_path \
--caption_path \
--output_path \
--openai_model \
--openai_key
```
If you are using Azure
```
python scripts/llm_reason.py \
--task NLQ \
--annotation_path \
--caption_path \
--output_path \
--azure \
--openai_endpoint \
--openai_model \
--openai_key
```
If you are using Vicuna, check its documentation on [OpenAI-Compatible RESTful APIs](https://github.com/lm-sys/FastChat?tab=readme-ov-file#openai-compatible-restful-apis--sdk)
```
python scripts/llm_reason.py \
--task NLQ \
--annotation_path \
--caption_path \
--output_path \
--openai_endpoint http://localhost:8000/v1 \
--openai_model vicuna-7b-v1.5
```
#### EgoSchema (Video QA)
```
python scripts/llm_reason.py \
--task QA
--annotation_path \
--caption_path \
--output_path \
--openai_model \
--openai_key \
```