Open whcjb opened 3 years ago
someone can help?
Hi @whcjb,
Thanks for reaching out!
I would guess this is because it is comparing against a scalar. We may need to update the converter to handle this case.
I will try to look into this soon.
Best, John
Hi @whcjb,
Thanks for reaching out!
I would guess this is because it is comparing against a scalar. We may need to update the converter to handle this case.
I will try to look into this soon.
Best, John
Hey @jaybdub,
I stumbled across pretty much the same error as the OP, and I can verify that it comes from t.shape being a scalar. In my case, the shape was (32744).
I've tried to write a simple case in the if statement here torch2trt.py#L174 with a condition not hasattr(t, '__len__')
however I cannot get the shape = tuple([1] * diff + list(t.shape))
to work, same error as OP.
How should I go about it? I can make a PR after I get it to work.
Hi @pepinu,
Hmm, do you mind sharing the error you see at shape = tuple([1] * diff + list(t.shape))
?
Also, thanks for your interest in addressing this!
It's difficult to tell exactly where the change should be applied without reproducing myself, but one other area of interest for this issue may be here
This is where constant tensors are added to the TensorRT network for primitive types. Let me know if you discover anything here, or if there's anything you'd like me to investigate.
As general contributing guidelines, before integrating any solution we'll have to see if there are adverse side effects that might effect other models. One way to do this is to add module test cases that address this failure, and ensure that the existing test cases run.
Many of the converter files have examples of module test cases.
The test cases may be run by calling
python3 -m torch2trt.test --name=converters --tolerance=1e-2
This test script was created for torch2trt and performs cross validation of the outputs against PyTorch. It will simply highlight high errors as "yellow", but not hard-fail. It might not cover all use cases. If the change requires a special type of test let me know.
Please let me know if this helps / you have any questions or if there is any way I can help.
Best, John
Hey @jaybdub,
Thanks for the pointers, I'll take a look at this over the weekend.
Here is earlier mentioned err more in-depth:
shape = tuple([1] * diff + list(t.shape))
to 3 lines as seen below:
The error here is the same as the OP, and happens when t.shape is put into the list:
but the error throws few lines after:
I suspect it would just have to be unpacked within the shape reported in the error? Hope it clarifies a little bit.
@jaybdub
Alright so I did some testing, I think I identified what the problem might be but I'm not sure how to proceed.
Basically, the problem in my case is not that the t is a scalar, the t.shape is a scalar. I've edited the last image in my earlier post because I had the wrong condition (if not hasattr(t, 'len') will not catch this).
The Problem
https://github.com/NVIDIA-AI-IOT/torch2trt/blob/44977a94cb087fe521421802e9df12a5ac3ceb3f/torch2trt/torch2trt.py#L174 In the issue scenario, t.shape, which is of type trt.Dims
has dimension 1 and looks like (32548). It has the len method but when invoked it throws the error. I tried to write a workaround with a lambda, but len is read-only in this case, so no luck there.
However, even if all len() calls are rewritten and length set arbitrarily to 1, the problem still persists here:
https://github.com/NVIDIA-AI-IOT/torch2trt/blob/44977a94cb087fe521421802e9df12a5ac3ceb3f/torch2trt/torch2trt.py#L177
list() calls len() internally which crashes the conversion.
I've tried to use brackets to put the t.shape object into a list, but the results are not the same:
I couldn't find a way to reproduce the same representation of the trt.Dims as in the traceback. list() makes it [32548] while tuple makes it (32548, ). I will look into finding a way to extract the t.shape value as represented when printed, maybe then I can somehow convert it inside.
I wonder if you have any pointers where I could look, maybe 1-dim tensor conversion is buggy?
Also, I'll try to get a minimal reproducible code for this, so it's reproducible.
Best regards
@jaybdub @whcjb
I found out the problem is the trt not being able to process slice() operator in the same fashion torch does.
The network I was trying to port crashed on torch.add() operation between two tensors, while converting minimal torch.add op worked like a charm.
My model was cutting spatial dimensions using python slice() operation, instead of torch.narrow recommended for tensors.
To check this is the culprit I wrote and tested 2 versions of a network that narrows dims and adds them together:
I think the screen is self-explanatory, here's a gist to reproduce this.
I'm not sure where to go from here, there should be some type check for slice within the lib, hope it helps.
Best Regards
EDIT:
I looked at the last screen and see that tensors are not matching between TRT and normal model, which is weird? I was sure that they were while writing this...
Hi @pepinu,
Hmm, do you mind sharing the error you see at
shape = tuple([1] * diff + list(t.shape))
?Also, thanks for your interest in addressing this!
It's difficult to tell exactly where the change should be applied without reproducing myself, but one other area of interest for this issue may be here
This is where constant tensors are added to the TensorRT network for primitive types. Let me know if you discover anything here, or if there's anything you'd like me to investigate.
As general contributing guidelines, before integrating any solution we'll have to see if there are adverse side effects that might effect other models. One way to do this is to add module test cases that address this failure, and ensure that the existing test cases run.
Many of the converter files have examples of module test cases.
The test cases may be run by calling
python3 -m torch2trt.test --name=converters --tolerance=1e-2
This test script was created for torch2trt and performs cross validation of the outputs against PyTorch. It will simply highlight high errors as "yellow", but not hard-fail. It might not cover all use cases. If the change requires a special type of test let me know.
Please let me know if this helps / you have any questions or if there is any way I can help.
Best, John
for my case, in https://github.com/NVIDIA-AI-IOT/torch2trt/blob/44977a94cb087fe521421802e9df12a5ac3ceb3f/torch2trt/torch2trt.py#L157
shape=(576, 960) and weight.shape=(1,1,576,960)
after run this line, I print t._trt
I got
[TensorRT] ERROR: [SHUFFLE #2] torch.Tensor.view(tensor(shape=[576], dtype=torch.float32), -1, 1): volume mismatch. Input dimensions [576] have volume 576 and output dimensions [1] have volume 1.
ValueError: __len__() should return >= 0
Guys, do you have a final solution regarding that issue?
+1 I am seeing a case where (perhaps a scalar) has a len of -1 according to tensorRT
I also seem to run into similar errors if a tensor (or argument to forward
) is None (this should probably just be pruned from the TRT conversion?)
Same problem
hey @jaybdub , can you give some inputs on how long will it take before this is fixed?
I meet same problem too, wish for some solution @jaybdub 0.0
Any ETA on this problem? Without the fix, torch2trt won't work for many models I tried: Hugging Face vision transformer, swintransformer, ViViT...etc
@kct22aws +1 on that question
+1 on that question
+1 on that question
+1 on that question
2024 and no fix? Anyone get any traction on this?
when use torch2trt convert the torch.eq, error occurs.
mm = torch.eq(mm, 0.)
mm is tensor and mm.shape = [3136, 1, 3, 3]File "/media/cfs/torch2trt-master/examples/inpainting/model.py", line 329, in forward mm = torch.eq(mm, nn) File "./torch2trt/torch2trt.py", line 285, in wrapper converter"converter" File "./torch2trt/converters/compare.py", line 26, in convert_gt return convert_elementwise(ctx, trt.ElementWiseOperation.EQUAL) File "./torch2trt/converters/compare.py", line 9, in convert_elementwise input_a_trt, input_b_trt = broadcast_trt_tensors(ctx.network, [input_a_trt, input_b_trt], len(output.shape) - 1) File "./torch2trt/torch2trt.py", line 170, in broadcast_trt_tensors if len(t.shape) < broadcast_ndim: ValueError: len() should return >= 0