Open layel2 opened 10 months ago
Thanks for reporting the issue. We are trying to reproduce the problem on our end. Will get back to you shortly.
Hi @layel2 ,
Since there's no reproduction code, I tried to modify LightGlue benchmark.py example like below and was able to compile with release 2.16 compiler and pytorch 2.1:
@@ -192,6 +193,11 @@ if __name__ == "__main__":
extractor.conf.max_num_keypoints = num_kpts
feats0 = extractor.extract(image0)
feats1 = extractor.extract(image1)
+ import torch_neuronx
+ matcher.pruning_keypoint_thresholds['xla'] = -1
+ #new_matcher = torch.jit.trace(matcher, {"image0": feats0, "image1": feats1})
+ new_matcher = torch_neuronx.trace(matcher, {"image0": feats0, "image1": feats1}, compiler_workdir="./workdir")
runtime = measure(
matcher,
{"image0": feats0, "image1": feats1},
However, after successful compilation where I see "Compiler status PASS", I then see the error RuntimeError: Tracer cannot infer type of ... Dictionary inputs to traced functions must have consistent type. Found Tensor and int
for the returning results. This is limitation of TorchScript, which you can see also when you use torch.jit.trace instead of torch_neuronx.trace.
The torch_neuronx.trace API uses torch.jit.trace under the hood. Thus, in order to make the model functional with torch_neuronx.trace, it must first be compatible with torch.jit.trace. The error indicates that model creates an output dictionary with mixed value types (float & int tensors). This is not supported by torch.jit.trace even when strict=False, which means that the trace fails. To get this model running, first see if you can get it working with torch.jit.trace. One think you can try is to avoid mixed value type dictionary outputs by using the same data type across all the tensors, or create a module wrapper that only return tensors of one data type.
Hi @jeffhataws ,
Sorry that I forgot to provide reproduction code, actually I fixed the error you got since I tried to compile by casting everything into torch tensor.
Here's my repo that contain fixed version of lightglue.py and also the compile code that I use. https://github.com/layel2/LightGlue-inf2 Compile code: https://github.com/layel2/LightGlue-inf2/blob/main/inf/lightglue.ipynb
LightGlue/lightglue.py
scores[:, -1, :-1] = F.logsigmoid(-z1.squeeze(-1))
return scores
-
+from types import SimpleNamespace
class MatchAssignment(nn.Module):
def __init__(self, dim: int) -> None:
super().__init__()
@@ -332,6 +332,7 @@ class LightGlue(nn.Module):
"mps": -1,
"cuda": 1024,
"flash": 1536,
+ "xla": -1,
}
required_data_keys = ["image0", "image1"]
@@ -579,14 +580,14 @@ class LightGlue(nn.Module):
"matches1": m1,
"matching_scores0": mscores0,
"matching_scores1": mscores1,
- "stop": i + 1,
- "matches": matches,
- "scores": mscores,
+ "stop": torch.tensor(i + 1).to(device),
+ "matches": torch.stack(matches).to(device),
+ "scores": torch.stack(mscores).to(device),
"prune0": prune0,
"prune1": prune1,
}
- return pred
+ return list(pred.values())
def confidence_threshold(self, layer_index: int) -> float:
"""scaled confidence threshold"""
Thanks
Hi, I tried to trace LightGlue model with inf2 instance but it got error and crash. Trace command
model_neuron = torch_neuronx.trace(model, input_features, compiler_args=["--target","inf2"])
then it got this output and crash.Model analyze result lightglue-neuronx-model-analyze.txt
neuronx-cc -V
pytorch version
Thank you for helping