Different raw video values across operating systems

Running load_video_frames on different operating systems leads slightly different values.

For example,

vs.

on linux vs. mac with the same ffmpeg version (4.4.3) and same python environment.

Value differences are off by 2 at a max (diffs are either 0, 1, or 2). The loaded frames look the same to the naked eye but are enough to generate slight differences in model predictions.

For example, the bounding boxes are almost identical but the confidences are on either side of the threshold we use for selecting frames for distance estimation (0.25).

In [47]: mdlite.detect_image(mac_arr[1])
Out[47]:
(array([[0.1824464 , 0.43740773, 0.26173705, 0.68636054]], dtype=float32),
 array([0.22770423], dtype=float32))

In [48]: mdlite.detect_image(linux_arr[1])
Out[48]:
(array([[0.18251045, 0.43775246, 0.26022527, 0.69025564]], dtype=float32),
 array([0.25554472], dtype=float32))

These differences in the frame selection model will have downstream impacts on depth and species predictions.

For example

filepath,aardvark,antelope_duiker,badger,bat,bird,blank,cattle,cheetah,chimpanzee_bonobo,civet_genet,elephant,equid,forest_buffalo,fox,giraffe,gorilla,hare_rabbit,hippopotamus,hog,human,hyena,large_flightless_bird,leopard,lion,mongoose,monkey_prosimian,pangolin,porcupine,reptile,rodent,small_cat,wild_dog_jackal
09190048_Hyena.AVI,0.00342,0.06417,0.0389,0.01094,0.02267,0.65171,0.00328,1e05,0.0119,0.03768,0.02369,0.00327,0.00811,0.00024,0.00031,0.00367,0.00438,0.011,0.02101,0.01816,0.00607,2e-05,0.01347,0.0002,0.05422,0.03666,0.00384,0.00511,0.00873,0.03173,0.02131,0.00943

vs.

09190048_Hyena.AVI,0.01119,0.14719,0.03155,0.01509,0.03291,0.5653,0.00524,1e05,0.03119,0.05381,0.04377,0.01406,0.01621,0.00029,0.00065,0.00784,0.00946,0.02151,0.05271,0.02293,0.01533,3e-05,0.02246,0.00022,0.07089,0.0557,0.00662,0.01067,0.01784,0.05964,0.04838,0.00982

The label with the max probability is the same, but the exact values differ.

It's unclear exactly what's causing this difference and how to resolve it. For now, it's worth knowing that we don't have exact replicability across operating systems.

drivendataorg / zamba

Different raw video values across operating systems #252