Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

YOLO-NAS - TorchScript output tensor format #1954

Closed kostastsing closed 5 months ago

kostastsing commented 6 months ago

💡 Your Question

I have exported a YOLO-NAS model to torchscript format based on the guidelines on https://github.com/Deci-AI/super-gradients/issues/994. I use the torchscript model in C++ code and the size of the output tensor (result of forward function) is [1, 1155, 4]. What may these dimensions mean?

An example of the contents of the output tensor is provided below.

   0.0059    0.3320   17.3125    5.4492
   0.9219    0.5996   54.2500    4.7148
   1.8438    0.4766   69.6875    4.6016
  -8.0938    0.4238   80.5625    4.4414
  -4.8750    0.4863   89.7500    4.3906
 -10.7188    0.4551   97.0000    4.4844
  -7.8125    0.4727  106.0000    4.5742
  -1.1250    0.5352  113.8125    4.6172
   6.1875    0.6270  121.3750    4.6641
  13.5625    0.5742  129.1250    4.6641
  21.6562    0.3770  141.0000    4.7305
  29.7812   -0.0117  146.8750    4.8281
  37.7812   -0.3047  155.3750    5.0664
  45.5625    0.1738  163.3750    5.3203
  52.3750   -0.5273  172.5000    5.6445
  60.6875   -0.9102  192.1250    5.3672
  68.1250   -2.8711  187.2500    5.2695
.
.
.
14.6719  112.7500   37.4375  150.5000
  46.0312  111.3750   59.6250  151.8750
  65.5000  110.3750   87.7500  160.0000
  96.1875  108.0000  114.8750  158.1250
 143.0000  108.6250  163.5000  147.8750
 154.6250  111.5000  178.5000  147.2500
 206.1250  109.3750  228.0000  146.7500
 222.5000  110.7500  243.7500  148.2500
 246.8750  109.5000  273.5000  148.8750
 301.7500  111.5000  316.0000  147.7500
 335.5000  118.4375  356.5000  147.7500

Versions

No response

BloodAxe commented 6 months ago

It is unclear what is happening in your example. I'd love to help you understand what those output tensor dimensions might signify. Could you provide a bit more context or code snippets? It would be helpful to see how you exported the model, how you perform inference in your C++ code, and if possible, a sample of the input image you're using for inference. This way, we can dive deeper into the specifics and provide more tailored assistance. Do you experience same behavior when using python only?