Can anyone please explain how the output tensors are formed? I am currently working on detecting multiple objects on an image ( I am using a yolov5 model). I believe the input tensor will be equal to the shape of the image ( for example, [1, 416, 416, 3]), however, i couldn't really understand how the output tensors should look like. Normally, I should expect the coordinates of each detected bounding box.
Can anyone please explain how the output tensors are formed? I am currently working on detecting multiple objects on an image ( I am using a yolov5 model). I believe the input tensor will be equal to the shape of the image ( for example, [1, 416, 416, 3]), however, i couldn't really understand how the output tensors should look like. Normally, I should expect the coordinates of each detected bounding box.