cvg / LightGlue

LightGlue: Local Feature Matching at Light Speed (ICCV 2023)
Apache License 2.0
3.41k stars 336 forks source link

Bad matches running on GPU (related to non_blocking parameter) #99

Open swengeler opened 10 months ago

swengeler commented 10 months ago

Hi, first off thanks for your work and releasing it in such a "nicely packaged" format!

While I managed to resolve my issue (described below), I figured it would be useful to document it in case others encounter it as well. In addition, if you have any further insight into why this might be happening (perhaps on my machine configurations specifically) that would be appreciated as well.

Encountered issue

I was getting strange/incorrect outputs running LightGlue on GPU. Using the two images below and the match_pair function gives the following output: cup_bad_matches

When running on CPU instead, I get the following output: cup_good_matches

The code used for this minimal example is the following:

import matplotlib.pyplot as plt
import torch

from lightglue import LightGlue, SuperPoint, viz2d
from lightglue.utils import load_image, match_pair

torch.set_grad_enabled(False)

# load models
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # or just "cpu" for the second example
extractor = SuperPoint(max_num_keypoints=2048).eval().to(device)
matcher = LightGlue(features="superpoint").eval().to(device)

print(f"Using device: {device}")

# load images
image0 = load_image("cup_image_0.jpg")
image1 = load_image("cup_image_1.jpg")

# extract features + correspondences
feats0, feats1, matches01 = match_pair(
    extractor, matcher, image0.to(device), image1.to(device), non_blocking=True
)
kpts0, kpts1, matches = feats0["keypoints"], feats1["keypoints"], matches01["matches"]
m_kpts0, m_kpts1 = kpts0[matches[..., 0]], kpts1[matches[..., 1]]

# visualize results
viz2d.plot_images([image0, image1])
viz2d.plot_matches(m_kpts0, m_kpts1, color="lime", lw=0.2)
viz2d.add_text(0, f'Stop after {matches01["stop"]} layers')
plt.show()

Possible solutions

I eventually figured out that this was caused by the batch_to_device function called by match_pair, or more specifically the non_blocking=True parameter. The three solutions I found are:

  1. Not using match_pair (as is e.g. done in the demo notebook), and moving the outputs I wanted to use to CPU "manually"
  2. Setting non_blocking=False (the default)
  3. Adding the following two lines after the call to match_pair (in the minimal example code above):
    stream = torch.cuda.current_stream()
    stream.synchronize()

Input data

cup_image_0.jpg cup_image_1.jpg
cup_image_0 cup_image_1

Environment info

Phil26AT commented 10 months ago

Hi @swengeler, thank you for reporting your issue and the solution to it! I could not reproduce it on my end, but I keep this issue open in case someone else faces this problem.