coolzhao / Geo-SAM

A QGIS plugin tool using Segment Anything Model (SAM) to accelerate segmenting or delineating landforms in geospatial raster images.
MIT License
199 stars 26 forks source link

Batch size does not decrease encoding time. #20

Open 4del-Yousefi opened 9 months ago

4del-Yousefi commented 9 months ago

Hello,

I am using the plugin and python package to encode and noticed that increasing the batch size doesn't increase the encoding speed. I have 4070ti and I got the following results.

Batch size: Time in sec 1: 141.44510889053345 2: 279.85611724853516 3: 424.3098211288452 4: 425.72585558891296

Code I used to get the result: times = {} for i in range(1,5): start_time = time.time() img_encoder = ImageEncoder(ckpt, batch_size=i) img_encoder.encode_image(image_path,feature_dir) init_settings, encode_settings = geosam.split_settings(settings) times[i] = time.time() - start_time

Fanchengyan commented 9 months ago

I found a few issues in your code that I'm not sure if you have passed the batch parameter correctly.

init_settings, encode_settings = geosam.split_settings(settings)

This line doesn't seem to be used to encode image in your code. Besides, if you are using parameters from a file, you need to update the batch in settings:

settings.update({"batch_size":i})
init_settings, encode_settings = geosam.split_settings(settings)

ImageEncoder only needs to be instantiated once. Instantiating it multiple times may lead to the checkpoint being loaded into memory multiple times, causing high memory usage.

So, I suggest changing it to:

times = {}
img_encoder = ImageEncoder(ckpt)
for i in range(1,5):
    start_time = time.time()
    img_encoder.batch_size=i
    img_encoder.encode_image(image_path, feature_dir)
    times[i] = time.time() - start_time

Regarding increasing the batch size, the encoding time did not decrease. There are many similar discussions online, and it seems that there is no consensus.

https://discuss.pytorch.org/t/increasing-batch-size-didnt-help-reduce-training-time/100960

https://discuss.pytorch.org/t/execution-time-does-not-decrease-as-batch-size-increases-with-gpus/85750

https://discuss.pytorch.org/t/increasing-batch-size-didnt-help-reduce-training-time/100960

As for SAM, I think this may be because SAM is a large model, and one batch has already occupied all the CUDA cores of your computer. Increasing the batch size in memory will only reduce the number of I/O operations and cannot reduce the model’s running time. In our early tests on A100, increasing the batch size could reduce the running time. You can check if one batch will make your GPU utilization close to 100%. (Note: check GPU utilization not memory usage)

In your case, increasing the batch size seems to increase the running time. I guess this may be because our code does not clear the data in the GPU after encoding the image, which causes insufficient GPU memory and a certain amount of I/O with the computer’s memory to prevent memory explosion, thereby reducing the subsequent running speed. We have updated GeoSAM-Image-Encoder to clear memory after each image encoding. We recommend testing it again after reinstalling it using the following code:

pip install GeoSAM-Image-Encoder --upgrade
# or
pip install GeoSAM-Image-Encoder==1.0.4
coolzhao commented 9 months ago

Thanks for your feedback. We will test this issue and try to find the problems.

Fanchengyan commented 9 months ago

Hi @4del-Yousefi ,

My previous guess should be correct based on testing on A100. Even on A100 (6912 cuda cores), when the batch size is increased to 2, the cuda utilization has reached 99%. Increasing the batch size further only puts more data into the GPU, but the GPU cuda cores have been fully utilized and cannot reduce the overall running speed. Therefore, only when the number of cuda cores is extremely large can a significant decrease in runtime be observed when increasing batch size.

Below are my test code and running times:

##  Download example dataset and sam `vit_l` checkpoint
wget https://raw.githubusercontent.com/coolzhao/Geo-SAM/main/rasters/beiluhe_google_img_201211_clip.tif
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
wget https://raw.githubusercontent.com/coolzhao/Geo-SAM/dev/GeoSAM-Image-Encoder/examples/data/setting.json
import time
import geosam
from geosam import ImageEncoder

checkpoint_path = './sam_vit_l_0b3195.pth'
image_path = './beiluhe_google_img_201211_clip.tif'
feature_dir = './'

## init ImageEncoder
img_encoder = ImageEncoder(checkpoint_path)

## test
times = {}
for i in range(1,5):
    start_time = time.time()
    img_encoder.batch_size=i
    img_encoder.encode_image(image_path, feature_dir)
    times[i] = time.time() - start_time

print(times)

{1: 5.970345973968506, 2: 3.5587611198425293, 3: 3.5929696559906006, 4: 3.5436811447143555}

4del-Yousefi commented 9 months ago

Hello,

Thanks for your update, I understand how it works now.

Also just a small question can Light HQ-SAM or any other SAM based models increase the speed or affect how batch size affect the process speed?

Thanks,