NOT_IMPLEMENTED : Non-zero status code returned while running MultiHeadAttention node

dmoti commented 1 year ago

I'm using the fused version of the models, it runs ok for several images until I'm getting this error:

[ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_1' Status Message: packed QKV format is not implemented for current GPU. Please disable it in fusion options. File "/data/eng/users/motid/work/findle-rel/findle-query/models/LightGlueONNX/onnx_runner/lightglue.py", line 53, in run matches0, mscores0 = self.lightglue.run( File "/data/eng/users/motid/work/findle-rel/findle-query/rerank/light_glue_reranker.py", line 125, in onnx_matcher_pair return self.runner.run(image0, image1, scales0, scales1) File "/data/eng/users/motid/work/findle-rel/findle-query/rerank/light_glue_reranker.py", line 153, in calculate_inliers_count kpts_query, kpts_pred, conf = self.onnx_matcher_pair(q_path, c_path) File "/data/eng/users/motid/work/findle-rel/findle-query/rerank/reranker.py", line 70, in rerank_by_matcher self.res_dict, time_mean = self.calculate_inliers_count(queries, database) File "/data/eng/users/motid/work/findle-rel/findle-query/algo_iface/FindleAlgo.py", line 31, in rerank return self.reranker.rerank_by_matcher([query], preds_array) File "/data/eng/users/motid/work/findle-rel/findle-query/findle_query/clients/vespa_client.py", line 113, in query res_dict, rerank_time_mean = findle_algo.rerank(image_filename, merged_paths) File "/data/eng/users/motid/work/findle-rel/findle-query/findle_query/services/findle_query_service.py", line 34, in query predictions = self.vespa_client.query(self.findle_algo, image_filename, self.__crop_query, dump_output_folder=self.__dump_output_folder) File "/data/eng/users/motid/work/findle-rel/findle-query/findle_query/api/routers/findle_query_router.py", line 24, in query result = findle_query_service.query(request.path) onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_1' Status Message: packed QKV format is not implemented for current GPU. Please disable it in fusion options.

XL634663985 commented 10 months ago

Please tell me how to solve it

fabio-sim commented 10 months ago

Hi @XL634663985, thank you for your interest in LightGlue-ONNX.

Regarding the packed QKV error for MultiheadAttention node, you can try using the fused_cpu versions in https://github.com/fabio-sim/LightGlue-ONNX/releases/tag/v1.0.0 . The only difference is that these use unpacked QKV, so they can also run on GPU.

biggiantpigeon commented 2 months ago

@fabio-sim I ran into the same error, but not all the time--only when keypoint number is small(the biggest one I met is 127). Is there some method I can bypass this problem? like set a minimum keypoint required? But how do I know the exact number? I'm using c++ to run the model, is there any interface I can set this? My model version is v1.0. Thanks a lot.

fabio-sim / LightGlue-ONNX

NOT_IMPLEMENTED : Non-zero status code returned while running MultiHeadAttention node #51