facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
47.9k stars 5.67k forks source link

Parameter documentation for huggingface config parameters #661

Closed johanneszellinger closed 10 months ago

johanneszellinger commented 10 months ago

Hi, I am currently working on a UI Wrapper for the SAM models on huggingface. Basically, the wrapper lets the user choose images and parameters and then dynamically generates the preprocessor_config.json and config.json files before running the model inference.

This works fine with default parameters, however I am wondering if I can find documentation of the parameters in the jsons (short description, value limits, datatype)? This would be quite helpful!

heyoeyo commented 10 months ago

I don't know much about the huggingface side of things, so I'd take what I say with a grain of salt... However, it looks like the config files contain a mix of SAM model config parameters and then loads of junk that has nothing to do with the SAM model (but maybe needs to be there for compatibility with the rest of the library?).

For example, there are nonsense entries like: end-of-sentence token id but these are mixed with real SAM parameters like the layer indices for global attention. So making sense of the config files probably requires hunting around for which entries are actually relevant.

The config.json file has 3 important sections: mask_decoder_config, prompt_encoder_config and vision_config, these seem to correspond with the model config for classes: MaskDecoder, PromptEncoder and ImageEncoderViT, respectively. The values in the config file seem to be for the 'huge' variant of the SAM model, which is set within the build_sam.py script. Similarly, the preprocessor_config.json seems to reference the preprocessing steps found in the SAM model class and parts of the predictor. So those are the places I would look for understanding the values/ranges of the different settings.

That being said, the values all seem to be related to the model structure. Changing these values makes sense if the goal is to load a different set of weights (like the 'base' or 'large' weights), but would otherwise break the model, since they aren't the kind of values that can be tuned for better performance or anything (at least, for a given set of weights).

johanneszellinger commented 10 months ago

Thank you a lot for the detailed information - this is already quite helpful!

Yes, this is also the first time for me using huggingface. As I understand it, it basically abstracts models away so they can used with a few lines of code - which is actually perfect for my usecase (eeneral web demonstrator for different ML models). So I guess it makes sense, that there is stuff in the configs for compatibility reasons.

NielsRogge commented 10 months ago

Hi @TheRealRolandDeschain this is all documented here for instance: https://huggingface.co/docs/transformers/model_doc/sam#transformers.SamMaskDecoderConfig.

johanneszellinger commented 10 months ago

@NielsRogge Thank you I will have a look through the linked documentation!