Concerns about excessive runtime

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

47.69k stars 5.64k forks source link

Concerns about excessive runtime #163

Closed kaixin-bai closed 1 year ago

kaixin-bai commented 1 year ago

In the paper it shows "The overall model design is largely motivated by efficiency. Given a precomputed image embedding, the prompt encoder and mask decoder run in a web browser, on CPU, in ∼ 50ms. This runtime performance enables seamless, real-time interactive prompting of our model."

But while testing the script "automatic_mask_generator_example" and the checkpoint "sam_vit_b_01ec64.pth" with gpu inference, it takes about 1.65s on average. Would it be possible to make the inference pipeline to take only 50ms?

mask_generator = SamAutomaticMaskGenerator(sam)

import time

total_time = 0.0
for i in range(10):
    start = time.time()
    masks = mask_generator.generate(image)
    total_time += (time.time() - start)
print("infer time: {}".format(total_time * 0.1))

nikhilaravi commented 1 year ago

This is our official implementation of how to run mask prediction using the onnx model with multithreading in the browser and the precomputed image embedding. This should have ~50ms latency. https://github.com/facebookresearch/segment-anything/tree/main/demo

Please see the README in the demo folder for more details.

dakoner commented 1 year ago

Yes, the problem is with the embedding. On my 3080 Ti, it takes ~1.5 seconds to compute an embedding for a 640x480 image. In the FAQ on the segment anything site, they say the embedding should take 0.15 seconds on A100 (suggesting an A100 does inference 10X faster than a 3080).

This isn't called out super-clearly in the paper, but they do mention a few times that the embedding encoder is "heavyweight". It appears it could be replaced with a "cheaper" encoder that outputs a CxWxH embedding, but I assume you'd have to retrain the model and the resulting performance might be a lot worse. Personally, i wish the embedding time was more explicitly called out in the paper.

JKelle commented 1 year ago

I followed that web demo (https://github.com/facebookresearch/segment-anything/tree/main/demo) and still cannot get close to 50 ms. Running the onnx model consistently takes 90 and 100 ms. I timed it with the following to isolate just model inference latency:

console.time("run model")
const results = await model.run(feeds);
console.timeEnd("run model")

I'm using the quantized model, and I can see that onnx is using ort-wasm-simd-threaded.wasm which I believe verifies that I'm using SharedArrayBuffers.

Is the demo missing something that would speed it up to 50 ms?

berry-ding commented 1 year ago

hi, we have proposed a method for rapid 'segment anything', using just 2% of the SA-1B dataset. It achieves precision comparable to SAM in edge detection (AP, .794 vs .793) and proposal generation tasks (mask AR@1000, 49.7 vs 51.8. E32). Additionally, our model is 50 times faster than SAM-H E32. The model is very simple, primarily adopting the yolov8seg structure. We welcome everyone to try it out, github: https://github.com/CASIA-IVA-Lab/FastSAM, arxiv: https://arxiv.org/pdf/2306.12156.pdf