Alternative implementation in Refiners

Hello everyone, and thank you for the fantastic work!

We are building Refiners, an open source, PyTorch-based micro-framework made to easily train and run adapters on top of foundational models. Just wanted to let you know that HQ-SAM is now natively supported on top of our SAM implementation!

A MWE in Refiners (similar to demo_hqsam.py) would look like this:

First, install Refiners using our install guide;
Then, download and convert the weights for SAM and HQ-SAM (ViT-H models) in Refiners format using the following snippet:
```
from scripts.prepare_test_weights import convert_hq_sam, convert_sam, download_hq_sam, download_sam
```

download_sam() download_hq_sam() convert_sam() convert_hq_sam()


- Finally, run the snippet below to do some inference using HQ-SAM:
```python
import torch
from PIL import Image

from refiners.fluxion.utils import load_from_safetensors, tensor_to_image
from refiners.foundationals.segment_anything import SegmentAnythingH
from refiners.foundationals.segment_anything.hq_sam import HQSAMAdapter

# Instantiate SAM model
sam_h = SegmentAnythingH(
    device=torch.device("cuda"),
    dtype=torch.float32,
    multimask_output=False,  # Multi-mask output is not supported by HQ-SAM
)
sam_h.load_from_safetensors("tests/weights/segment-anything-h.safetensors")

# Instantiate HQ-SAM adapter, with downloaded and converted weights
hq_sam_adapter = HQSAMAdapter(
    sam_h,
    hq_mask_only=True,
    weights=load_from_safetensors("tests/weights/refiners-sam-hq-vit-h.safetensors"),
)

# Patch SAM with HQ-SAM by “injecting” the adapter
hq_sam_adapter.inject()

# Define the image to segment and the prompt
tennis_image = Image.open("tests/foundationals/segment_anything/test_sam_ref/tennis.png")
box_points = [[(4, 13), (1007, 1023)]]

# Run inference
high_res_masks, *_ = sam_h.predict(input=tennis_image, box_points=box_points)

predicted_mask = tensor_to_image(high_res_masks)
predicted_mask.save("predicted_mask.png")

You should now have generated the following mask (note: the image has been downsized by 50% in postprocessing to fit on GitHub):

predicted_mask

A few more things:

Refiners built-in training utils can be used to train/fine-tune HQ-SAM (see the 101 guide for an overview);
Adapters can be composed easily in Refiners, e.g. you may experiment injecting and training LoRAs in various part of SAM.

Feedback welcome!

SysCV / sam-hq

Alternative implementation in Refiners #127