Faster image transforms

yuyu2172 commented 7 years ago

Currently, all images are assumed to be converted to numpy.ndarray right after loaded from the disk. However, this leads to unnecessary copy of images. For example, in the case when a crop of an image is needed, it is not necessary to load the entire image into a numpy array.

By not copying the data to a numpy array right after image loading, this kind of optimization becomes possible. To verify that improvements can happen, I wrote a simple example. This example supplies a dataset to a MultiprocessIterator and measures the performance. The dataset crops an image by fixed size. NoPILDataset uses the current method to load an image. PILDataset calls crop method from PIL.

I tested the performance with train split of ImageNet. The spec of my machine is as follows.

Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (6 cores)
128GB
Ubuntu 14.04
Chainer 3.0.0a1

After iterating for 500 iterations, the results are as follows. The proposed change was 1.56 times faster.

# The current
$ python benchmark.py 0
count=500  recent_speed=4.16689417015  overall_speed=3.71505159313

# The proposed
$ python benchmark.py 1
count=500  recent_speed=7.18092074851  overall_speed=5.09444490369

Note that this improvement is very important when training with a large batch (e.g. ImageNet). I was having a performance issue with training a model on ImageNet.

# filename: benchmark.py
# Usage:
# python benchmark 0   # the current method
# python benchmark 1  # the proposed method
import numpy as np
import os
from PIL import Image
import time
import chainer

class NoPILDataset(chainer.dataset.DatasetMixin):

    def __init__(self, paths):
        self.paths = paths

    def __len__(self):
        return len(self.paths)

    def get_example(self, i):
        path = self.paths[i]
        f = Image.open(path)
        img = f.convert('RGB')
        img = np.asarray(img).transpose(2, 0, 1)
        img = img[:, :224, :224]
        return img

class PILDataset(chainer.dataset.DatasetMixin):

    def __init__(self, paths):
        self.paths = paths

    def __len__(self):
        return len(self.paths)

    def get_example(self, i):
        path = self.paths[i]
        f = Image.open(path)
        img = f.convert('RGB').crop((0, 0, 224, 224))
        img = np.asarray(img).transpose(2, 0, 1)
        return img

if __name__ == '__main__':
    import sys

    # Path to the training dataset of ImageNet.
    # (It can be any root directory of a image dataset.)
    dirname = '/data/imagenet/train'
    paths = []
    for cur_dir, _, names in os.walk(dirname):
        for name in names:
            paths.append(os.path.join(cur_dir, name))

    if int(sys.argv[1]) == 1:
        print('use PIL directly')
        dataset = PILDataset(paths)
    else:
        print('do not use PIL directly')
        dataset = NoPILDataset(paths)
    it = chainer.iterators.MultiprocessIterator(dataset, 192, shared_mem=3 * 224 * 224 * 4, n_processes=12, shuffle=False)

    start = time.time()
    times = []
    count = 0
    while True:
        if count == 500:
            break
        recent_start = time.time()
        try:
            it.next()
        except StopIteration:
            break

        end = time.time()
        times.append(end)
        count += 1
        print(
            'count={}  recent_speed={}  overall_speed={}'.format(
                count, 1./(end - recent_start), count / (end - start)))

yuyu2172 commented 7 years ago

I made a simpler benchmark script to measure time to load images with crop.

from PIL import Image
import numpy as np

from chainercv.utils import write_image
import time

import cv2

def crop_pil(path):
    img = Image.open(path).convert('RGB')
    img = img.crop((0, 0, 224, 224))
    img = np.asarray(img).transpose(2, 0, 1)
    return img

def crop_numpy(path):
    img = Image.open(path).convert('RGB')
    img = np.asarray(img).transpose(2, 0, 1)
    img = img[:, :224, :224]
    return img

def crop_cv2(path):
    img = cv2.imread(path, cv2.IMREAD_COLOR).transpose(2, 0, 1)
    img = img[::-1, :224, :224]
    return img

if __name__ == '__main__':
    img = np.random.uniform(0, 255, size=(3, 4000, 4000))
    path = 'a.jpg'
    img = write_image(img, path)

    times = []
    for i in range(30):
        start = time.time()
        crop_cv2(path)
        times.append(time.time() - start)
    print('crop_cv2   mean={}'.format(np.mean(times)))

    times = []
    for i in range(30):
        start = time.time()
        crop_pil(path)
        times.append(time.time() - start)
    print('crop_pil   mean={}'.format(np.mean(times)))

    times = []
    for i in range(30):
        start = time.time()
        crop_numpy(path)
        times.append(time.time() - start)
    print('crop_numpy   mean={}'.format(np.mean(times)))

Results:

crop_cv2   mean=0.272049093246
crop_pil   mean=0.301412550608
crop_numpy   mean=0.324815416336

Hakuyume commented 7 years ago

If using cv2.imread is fastest, we can simply use cv2.imread in read_image. This doesn't require any changes of APIs. If you want to use PIL, we have to change APIs.

yuyu2172 commented 7 years ago

Yes. That is my conclusion too. I am guessing that most of the time is spent decoding jpg image, and it seems that cv2 has a better decoder.

Hakuyume commented 7 years ago

I am guessing that most of the time is spent decoding jpg image, and it seems that cv2 has a better decoder.

I guess this is depends on the configuration of OpenCV. I will try your benchmark in my environment.

Hakuyume commented 7 years ago

In my environment, cv2 was fastest, too.

crop_cv2   mean=0.22242753505706786
crop_pil   mean=0.34299739996592205
crop_numpy   mean=0.38132399717966714

chainer / chainercv

Faster image transforms #395