what is load_image doing internally and how to apply the same operation to frames from video

Doing some testing I noticied that doing inference returns very different results for the same image but loaded with different methods:

Method 1: the official load_image function from the library(it reads the image using the path passed as argument)
Method2: using cv2 to read the image, then converting to tensor and then swapping axis to have depth as first axis.

As I said, both methods give you a tensor to pass to the model, but they return very different results(method2 usually are bad), I inspected the shape of the image returned by both cases and they are different so defintelly there are transformations going on inside load_image, my question is: what is happening inside load_image? so I can replicate it in other scripts

My end goal is to run the model on video, I mean running the model on frames in the video, so I cannot use load_image because they are not images from disk, they are obtained from the video, so I need to understand what is happening inside_load image so I can emulate that behavior on the frames of the video.

Thanks

IDEA-Research / GroundingDINO

what is load_image doing internally and how to apply the same operation to frames from video #370