Closed fabio-sim closed 1 year ago
Very cool, thank you! We are very interested in a version that is easy to deploy and integrate in a C++-only environment but also fast (= with the adaptive mechanisms). To get there, I have a few questions:
torch.compile
on the model as an additional data point?Thanks a lot for taking the time!
if
-s, break
-s, etc.), but the benchmarks were all on non-adaptive versions (i.e., no early stopping or adaptive pruning). I would venture a guess that the ONNX versions are slower at higher keypoint numbers due to differing operator implementations.--safe
flag option) was really an experiment to test whether ONNX script export works. Normally the default path is to let torch.onnx.export
call jit.trace
for ONNX trace export. Under script export, however, conditionals are exported to If
operator nodes that lead to different subgraphs (creating a new subgraph for every possible branch), resulting in quite a messy ONNX graph, to say the least (a lot of emitted warnings during runtime). I've yet to test if an early exit (break/return
) is scriptable, though. When you say you would like to benchmark a jitted model without ONNX export, do you mean to benchmark TorchScript?torch.compile
yet, but my understanding is that it's mostly relevant for speeding up PyTorch and shouldn't have any effect during export?dict
inputs if I recall correctly, etc.). Jit script is also particularly nit-picky that type hints match the actual runtime types in every called function since it inspects the source code directly.Anyway, I'll give scripting a try to see if the adaptive mechanisms can be exported at all, but for the moment it looks like one must go through ONNXRuntime's TensorrtExecutionProvider. Thanks and I hope you find these answers helpful!
Thank you very much for this very insightful reply!
if
involved - only static conditions. This condition could be removed if it is detected as dynamic:
https://github.com/cvg/LightGlue/blob/fe7fb4fa0cffec65e33bf4c2f62a863d5b03433a/lightglue/lightglue.py#L406-L407torch.compile
(which uses Triton for graph fusion) but can still fuse simpler sub-graphs (elementwise ops).torch.export
to export compiled graph to be executed in other environments. TorchScript is not maintained anymore and will be deprecated once torch.export becomes mature.From official communications it's unclear whether torch.dynamo supports or will later support dynamic control flows. If not, we could instead export and compile a sub-graph for each layer and have the early stopping logic in the parent scope. This adds a synchronization point between layers but would still benefit from optimizations within each layer.
Hi @Phil26AT @Skydes ,
I've managed to make a working TensorRT-compatible version of LightGlue, and fortunately, OpenVINO support came out of the box!
This PR adds that info to the README in case anyone would like to deploy using the aforementioned formats :)