AIDajiangtang / Segment-Anything-CPP

segment anything(SAM) for CPP Inference
29 stars 2 forks source link

Checkpoints? #1

Open twardoch opened 1 year ago

twardoch commented 1 year ago

Could you provide instructions how to get the ONNX versions of the encoder & decoder?

AIDajiangtang commented 1 year ago

In the official repository, only the code to export the decoder onnx format is included, but in this branch:https://github.com/visheratin/segment-anything/,the encoder part has been added.

you can use the script command blew to generate both encoder and decoder onnx format model. python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type --encoder-output<path/to/encoder output> --decoder-output<path/to/decoder output>

for echeckpoint parameter, it is original pytorch format pretrained model. default or vit_h: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth vit_l: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth vit_b: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

for model-type parameter: "In ['default', 'vit_h', 'vit_l', 'vit_b']. Which type of SAM model to export.",

Other parameters can be set according to your needs.

You can also directly download the converted model. !wget https://huggingface.co/visheratin/segment-anything-vit-b/resolve/main/encoder-quant.onnx !wget https://huggingface.co/visheratin/segment-anything-vit-b/resolve/main/decoder-quant.onnx

twardoch commented 1 year ago

Thanks! I also know of https://github.com/vietanhdev/samexporter which looks nice (and it's packaged). I just don't know if it does the same thing (I know that certain things were hardcoded in one of the onnx encoders, like image size).

AIDajiangtang commented 1 year ago

They all implement the ONNX format export function by calling the interface blew:

torch.onnx.export(model, args, f, export_params=True, verbose=False, training=False, input_names=None, output_names=None, dynamic_axes=None, opset_version=None, do_constant_folding=True, example_outputs=None, strip_doc_string=True, keep_initializers_as_inputs=None, propagate=None, use_external_data_format=None)

but they have slightly different processing of the parameters of the torch.onnx.export interface. For example, for dynamic_axes with dynamic parameters, the code of vietanhdev can make the image size input by the encoder dynamic dynamic_axes = { "input_image": {0: "image_height", 1: "image_width"}

and the code of visheratin sets the input image size to be fixed, while the batch size is dynamic. You can combine the two codes as a reference according to your needs

twardoch commented 1 year ago

Thanks for explanation!