facebookresearch / hiera

Hiera: A fast, powerful, and simple hierarchical vision transformer.
Apache License 2.0
870 stars 39 forks source link

Weird inference time #1

Closed alexcbb closed 1 year ago

alexcbb commented 1 year ago

Hello, Thank you for your very nice work !

I wanted to test the models on inference with single images on my laptop. I tried to load different sizes of the model to check the improvement in inference time but a weird issue happened : the smaller the model, the longer the inference become ! (724ms for the tiny model vs 480ms for the huge one)

I verified that the loaded models were the good one by printing out the architectures and it did not seem to be wrong.

I tested either loading from Torch hub, or installing the package, both gave me the same results.

So I don't know what happened, did you ever encounter this issue ?

Here are the informations of the laptop: Ubuntu 22 GPU : Nvidia RTX A2000 (8Go) torch 2.0.0+cu117 (maybe an issue coming from here ?)

Thank you in advance !

(EDIT) : Weirder again, I tried to downgrade Torch to version 1.10 and the inference time went down to 480ms for the tiny model and to 300ms for the huge one, but still the issue is not solved.

(DOUBLE-EDIT) : Maybe I went to fast on the Github, the trained checkpoints of those models are not yet available, maybe that explains my problem...

dbolya commented 1 year ago

Hi Alex,

I am unable to reproduce this. Even on a CPU, I get 32ms for the tiny model and 356ms for the huge model for single-image inference. I'm also using torch 2.0.0, and the following testing script:

import torch, timeit
x = torch.zeros(1, 3, 224, 224) # Single image

# Test tiny model
model = torch.hub.load("facebookresearch/hiera", model="hiera_tiny_224")
model(x)  # Warm up model
print(timeit.timeit(stmt="model(x)", number=100, globals={"model":model, "x":x}) / 100)  # 0.032561959549784664s

# Test huge model
model = torch.hub.load("facebookresearch/hiera", model="hiera_huge_224")
model(x)  # Warm up model
print(timeit.timeit(stmt="model(x)", number=100, globals={"model":model, "x":x}) / 100)  # 0.3557194647192955s

Alternatively you can include the provided benchmarking script (see examples), but that assumes GPU I think. Also, the benchmark should be (largely) unaffected by checkpoint availability.

alexcbb commented 1 year ago

Thank you for your answer, with your code it seem to work well also.

I think that maybe the problem came from the way I was pre-processing my image because I changed it and the problem do not occur anymore. Here is the code I use now :

images_path = [path_img1, path_img2, path_img3, ...]

model = torch.hub.load("facebookresearch/hiera", model="hiera_tiny_224") # TODO: change the model
model.to(device)

input_size = 224

transform = transforms.Compose([
    transforms.Resize((input_size, input_size)),
    transforms.ToTensor(),
    transforms.Normalize(0.5, 0.5),
])

for image_path in images_path:
    img = Image.open(image_path)
    img_n = transform(img)
    img_n =img_n.to(device)
    img_n = img_n.unsqueeze(0)

    start = time.time()
    output = model(img_n)
    end = time.time()

    print(f"Inference time : {int((end-start)* 1000)}ms")

With the code above I obtained 10ms on Huge model after warmup, 8ms on base and 6ms on Tiny (using the GPU listed in my previous message). Does it seem to be coherent inference time for you ?

dbolya commented 1 year ago

I obtained 10ms on Huge model after warmup, 8ms on base and 6ms on Tiny

For single image this seems reasonable because the tiny is not able to make use of all the CUDA cores on the GPU.

If your task allows, I suggest using a higher batch size (but of course that's not always possible).
I also suggest using fp16 (if your GPU supports it), as Hiera benefits from fp16 a lot more than e.g., convnets.

I'll close this issue for now. Feel free to reopen if you have any other questions.