Closed benemana closed 11 months ago
Hi @benemana. You need to first use the index_video
of the similarity model to estimate the similarity between the two videos correctly.
Modifying your code, this would be as follows:
import torch
from utils import load_video
import evaluation as eval # This is your evaluation.py module
# Load the two videos from the video files
query_video = torch.from_numpy(load_video('./data/examples/video1/'))
target_video = torch.from_numpy(load_video('./data/examples/video2/'))
# Initialize pretrained ViSiL model
feat_extractor = torch.hub.load('gkordo/s2vs:main', 'resnet50_LiMAC').to('cuda')
model = torch.hub.load('gkordo/s2vs:main', 's2vs_dns').to('cuda')
model.eval()
# Extract features of the two videos
query_features = eval.extract_features(feat_extractor, query_video.to('cuda'))
target_features = eval.extract_features(feat_extractor, target_video.to('cuda'))
# Index the two videos with model
query_indexed_features = model.index_video(query_features)
target_indexed_features = model.index_video(target_features)
# Calculate similarity between the two videos
similarity = model.calculate_video_similarity(query_indexed_features, target_indexed_features)
print(similarity)
Please let me know if that works.
EDIT: After further readings, I update the code in the following way
[...]
# Extract features of the two videos
query_features = eval.extract_features(feat_extractor.to('cuda'), query_video.to('cuda'))
target_features = eval.extract_features(feat_extractor.to('cuda'), target_video.to('cuda'))
query_index = model.index_video(query_features)
target_index = model.index_video(target_features)
# Calculate similarity between the two videos
similarity = model.calculate_video_similarity(query_index, target_index)
Now the results appear to be much more accurate, and I noticed that completely different videos get negative similarity score.
Hi @benemana. You need to first use the
index_video
of the similarity model to estimate the similarity between the two videos correctly.Modifying your code, this would be as follows:
import torch from utils import load_video import evaluation as eval # This is your evaluation.py module # Load the two videos from the video files query_video = torch.from_numpy(load_video('./data/examples/video1/')) target_video = torch.from_numpy(load_video('./data/examples/video2/')) # Initialize pretrained ViSiL model feat_extractor = torch.hub.load('gkordo/s2vs:main', 'resnet50_LiMAC').to('cuda') model = torch.hub.load('gkordo/s2vs:main', 's2vs_dns').to('cuda') model.eval() # Extract features of the two videos query_features = eval.extract_features(feat_extractor, query_video.to('cuda')) target_features = eval.extract_features(feat_extractor, target_video.to('cuda')) # Index the two videos with model query_indexed_features = model.index_video(query_features) target_indexed_features = model.index_video(target_features) # Calculate similarity between the two videos similarity = model.calculate_video_similarity(query_indexed_features, target_indexed_features) print(similarity)
Please let me know if that works.
Thank you so much, this is pretty similar to the new version of the code I implemented yesterday evening, as reported in the EDIT message above.
Only one question: according to your experience, which similarity threshold would be reasonable for the task of video copy detection?
I took some experiments with a target video A and some query videos, X, Y, and Z:
So, from what I'm seeing here, scores that are slightly negative could still be an indicator for a potential video copy.
Thank you again for your support
Unfortunately, this is not an easy question to answer and needs more digging. Some usual factors that should be taken into account for this decision are the queries that you anticipate, the database videos that you have, the underlying application and the precision level that you want it to operate. In my experience, a value around 0. is a rather safe threshold, but the above factors can shift this value significantly.
The safest practice is to calibrate this threshold on a representative annotated dataset so as to select the value for the precision score that you want your system to operate.
I am closing this issue as it has been resolved. Feel free to open it if there is any unaddressed issue or ask anything related to it.
Hi, I tested your pretrained model using the two videos inside data/examples.
Starting from the suggestions you provided, I wrote the following code
The results I got are:
Since video1 and 2 are completely different, I would have expected a lower value for the similarity score. I'm mainly interested in the copy detection task and I wonder if 0.79 can actually be considered a "low value" such that I can argue that the two videos are not potential copies.
Maybe I'm missing something or my code is wrong.
Any help would be really appreciated.
Thank you again for this work