Inference in half precision

facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Apache License 2.0

8.79k stars 763 forks source link

Inference in half precision #380

Open treasan opened 6 months ago

treasan commented 6 months ago

Hey,

just wanted to ask whether it is safe to run DINOv2 in half precision for inference. Is there any degradation in the quality of the features?

Thanks!

heyoeyo commented 6 months ago

Whether or not float16 leads to a degradation probably depends on the use case. For example, if you pass the features into a model that's very heavily fit to the output of the float32 version, maybe it breaks on float16. This seems to (very rarely) happen on some depth estimation models I've tried. Using bfloat16 usually fixes the problem.

Following from issue #373, here's a comparison of the token norms at each block of vit-l running on the same input image, with float32 on the left & float16 on the right. Qualitatively at least, they're identical. Even the 'high norm' tokens are the same, which suggests that the float16 conversion doesn't lead to unstable results. vitl_f32_vs_f16

treasan commented 6 months ago

Thank you very much!

heyoeyo commented 6 months ago

Just as a follow up, to not give the impression that bfloat16 is always better to use, here's a small zoomed/cropped section of a depth estimate of some rocks on the ground. There's a plane-of-best-fit removal and a contrast boost, so it's a bit of an extreme example, just to show the differences. Float32 is on the left, then float16, then bfloat16:

f32_f16_bf16

Float32 and 16 look very similar (there are slight differences when flipping between images though). On the otherhand, bfloat16 has visible grid-like artifacts.

From what I've seen, float16 is usually fine, but rarely will give random inf or NaN results, in which case bfloat16 tends to give more reasonable results (but otherwise always has small artifacts).

maulanaazhari commented 5 days ago

hi @heyoeyo, I am curious about the inference speed up using float16 or bfloat16 inference, would you care to share your experience with it?

heyoeyo commented 4 days ago

hi @maulanaazhari

Switching from float32 to float16 or bfloat16 consistently gives a 2x speed up on almost any model I've tried (the only exception is very small models). If you're also using xFormers (with dinov2), then that seems to give an additional speed up when using f16.