Algy / fast-slic

20x Real-time superpixel SLIC Implementation with CPU
MIT License
261 stars 34 forks source link
blazingly-fast image-segmentation python slic superpixel-algorithms superpixels

Fast Slic

Fast-slic is a SLIC-variant algorithm implementation that aims for significantly low runtime with cpu. It runs 7-20 times faster than existing SLIC implementations. Fast-slic can process 1280x720 image stream at 60fps.

It started as a part of my hobby project that demanded true "real time" capability in video stream processing. Among pipelines of it was a postprocessing pipeline smoothing the result of image with SLIC superpixels and CRF. Unfortunately, there were no satisfying library for real-time(>30fps) goal. gSLICr was the most promising candidate, but I couldn't make use of it due to limited hardware and inflexible license of CUDA. Therefore, I made the blazingly fast variant of SLIC using only CPU.

Paper preprint

Demo

demo_clownfish demo_tiger

Installation

pip install fast_slic

Basic Usage

import numpy as np

from fast_slic import Slic
from PIL import Image

with Image.open("fish.jpg") as f:
   image = np.array(f)
# import cv2; image = cv2.cvtColor(image, cv2.COLOR_RGB2LAB)   # You can convert the image to CIELAB space if you need.
slic = Slic(num_components=1600, compactness=10)
assignment = slic.iterate(image) # Cluster Map
print(assignment)
print(slic.slic_model.clusters) # The cluster information of superpixels.

If your machine has AVX2 instruction set, you can make it three times faster using fast_slic.avx2.SlicAvx2 class instead of fast_slic.Slic. Haswell and newer Intel cpus, Excavator, and Ryzen support this.

import numpy as np

# Much faster than the standard class
from fast_slic.avx2 import SlicAvx2
from PIL import Image

with Image.open("fish.jpg") as f:
   image = np.array(f)
# import cv2; image = cv2.cvtColor(image, cv2.COLOR_RGB2LAB)   # You can convert the image to CIELAB space if you need.
slic = SlicAvx2(num_components=1600, compactness=10)
assignment = slic.iterate(image) # Cluster Map
print(assignment)
print(slic.slic_model.clusters) # The cluster information of superpixels.

If your machine is ARM with NEON instruction set, which is commonly supported by recent mobile devices and even Raspberry Pi, you can make it two-fold faster by using fast_slic.neon.SlicNeon class instead of the original one.

Performance

With max iteration set to 10, run times of slic implementations for 640x480 image are as follows:

Implementation Run time(ms)
skimage.segment.slic 216ms
cv2.ximgproc.createSuperpixelSLIC.iterate 142ms
fast_slic.Slic(single core build) 20ms
fast_slic.avx2.SlicAvx2(single core build /w avx2 support) 12ms
fast_slic.Slic(w/ OpenMP support) 8.8ms
fast_slic.avx2.SlicAvx2(w/ OpenMP, avx2 support) 5.6ms

(RGB-to-CIELAB conversion time is not included. Tested with Ryzen 2600x 6C12T 4.0Hz O.C.)

Known Issues

Tips

TODO