Speed and storage reproduce

Dear author and other people who are interested,

I am interested in ML on compressed video recently, however, after trying few codes on this repo, I have the following concerns.

why the motion vector is stored by int64 by default? from my experiment, the size of it can easily be larger than the original video.
even if I change the type of motion vector to uint8, the size of it is still 4* larger than the original video. I guess there is further compression behind H.264 (I am new to video codec), can anyone confirm it?
read the pre-saved .npy motion vector seems only have a very limited advantage compared to directly reading the RGB. (7s for motion vector, 8s for OpenCV RGB reading, 30minutes video). I understand reading the .npy is not the same as reading the byte from video by C++ (although np.load is C++ backend), but since the decoder is not so heavy, I still feel that only reading motion vector can give limited benefit.
Besides, can I use mv-extractor to directly get the motion vector?

I am appreciate to any comments or help! thanks in advance.

LukasBommes / mv-extractor