Closed hongluzhou closed 2 months ago
Thanks for your attention!
P.S. The checkpoint performance in the repository is expected to slightly surpass what was reported in the paper due to code refactoring (The AUC on LaSOT is around 58 compared to the reported 56.1).
Feel free to reach out to me if you encounter any problems reproducing the results.
Best regards, Han
Thank you so much for open sourcing this! It's a great effort and truly outstanding work!
I attempted to run the code but I got some questions:
Should "lmsys/vicuna-7b-v1.5" be used in the "tokenizer" fields of the configuration file? Specifically, I'm referring to line 43 and line 74 in the sample_config.yaml.
There seems to be a bug in the data loader code. Using the current implementation of "video_llm_data.py", a
ValueError
in line 238 will be raised. If I'm understanding correctly, this error occurs because the lengths ofdata_dict['frames']
anddata_dict['box']
differ by the end of thesample_frames(...)
function in line 351.To resolve this issue, I inserted the following lines after line 359:
This modification ensures that only the boxes corresponding to the sampled frames are retained. Does this bug fix appear correct to you?
I've reformatted LaSOT's test set annotations (the version "Sequences of Testing set only" from http://vision.cs.stonybrook.edu/~lasot/download.html) into JSON format following the instructions in the Elysium GitHub readme. A sample entry in the JSON file looks like this:
However, I've noticed that the inference speed is extremely slow—seemingly taking days to complete on a single H100 GPU with the default configurations. Is this inference time expected for LaSOT?
To provide more context, the reformatted JSON file contains 280 single-object trajectories, with some trajectories spanning thousands of frames. The following figure presents a histogram of trajectory lengths in LaSOT, measured by the number of frames per trajectory. According to the evaluation code, the total number of evaluation samples is 98,036. Does anything about this setup seem unusual or concerning to you?
Looking forward to your response!