drivendataorg / zamba

A Python package for identifying 42 kinds of animals, training custom models, and estimating distance from camera trap videos
https://zamba.drivendata.org/docs/stable/
MIT License
118 stars 27 forks source link

Different raw video values across operating systems #252

Open ejm714 opened 1 year ago

ejm714 commented 1 year ago

Running load_video_frames on different operating systems leads slightly different values.

For example,

image

vs.

image

on linux vs. mac with the same ffmpeg version (4.4.3) and same python environment.

Value differences are off by 2 at a max (diffs are either 0, 1, or 2). The loaded frames look the same to the naked eye but are enough to generate slight differences in model predictions.

For example, the bounding boxes are almost identical but the confidences are on either side of the threshold we use for selecting frames for distance estimation (0.25).

In [47]: mdlite.detect_image(mac_arr[1])
Out[47]:
(array([[0.1824464 , 0.43740773, 0.26173705, 0.68636054]], dtype=float32),
 array([0.22770423], dtype=float32))

In [48]: mdlite.detect_image(linux_arr[1])
Out[48]:
(array([[0.18251045, 0.43775246, 0.26022527, 0.69025564]], dtype=float32),
 array([0.25554472], dtype=float32))

These differences in the frame selection model will have downstream impacts on depth and species predictions.

For example

filepath,aardvark,antelope_duiker,badger,bat,bird,blank,cattle,cheetah,chimpanzee_bonobo,civet_genet,elephant,equid,forest_buffalo,fox,giraffe,gorilla,hare_rabbit,hippopotamus,hog,human,hyena,large_flightless_bird,leopard,lion,mongoose,monkey_prosimian,pangolin,porcupine,reptile,rodent,small_cat,wild_dog_jackal
09190048_Hyena.AVI,0.00342,0.06417,0.0389,0.01094,0.02267,0.65171,0.00328,1e05,0.0119,0.03768,0.02369,0.00327,0.00811,0.00024,0.00031,0.00367,0.00438,0.011,0.02101,0.01816,0.00607,2e-05,0.01347,0.0002,0.05422,0.03666,0.00384,0.00511,0.00873,0.03173,0.02131,0.00943

vs.

09190048_Hyena.AVI,0.01119,0.14719,0.03155,0.01509,0.03291,0.5653,0.00524,1e05,0.03119,0.05381,0.04377,0.01406,0.01621,0.00029,0.00065,0.00784,0.00946,0.02151,0.05271,0.02293,0.01533,3e-05,0.02246,0.00022,0.07089,0.0557,0.00662,0.01067,0.01784,0.05964,0.04838,0.00982

The label with the max probability is the same, but the exact values differ.

It's unclear exactly what's causing this difference and how to resolve it. For now, it's worth knowing that we don't have exact replicability across operating systems.