bigcat88 / pillow_heif

Python library for working with HEIF images and plugin for Pillow.
BSD 3-Clause "New" or "Revised" License
221 stars 17 forks source link

Keep heif_opener registered after mp spawn #277

Closed yit-b closed 3 months ago

yit-b commented 3 months ago

Describe why it is important and where it will be useful

When decoding images with a torch dataloader (or more generally a multiprocessing pool) and mp/torch start_method = "spawn", needing to register the heif opener per-process (e.g. in an initializer or worker_init_fn) is a bit of a gotcha. Calling register_heif_opener() in the global scope of your program is not enough as the plugin gets unregistered after the spawn.

Repro:

import io
from functools import partial
from typing import List

from PIL import Image
from torch import multiprocessing as mp
from torchvision.transforms import v2
from torchvision.transforms.functional import pil_to_tensor

sample_image_transform = v2.Compose(
    [
        io.BytesIO,
        Image.open,
        partial(Image.Image.convert, mode="RGB"),
        pil_to_tensor,
    ]
)

def heif_init():
    from pillow_heif import register_heif_opener
    register_heif_opener()

def test_register_heif_once(image_bytes: List[bytes]):
    heif_init()

    with mp.Pool(1) as pool:
        pool.map(sample_image_transform, image_bytes)

def test_register_heif_once_per_process(image_bytes: List[bytes]):
    with mp.Pool(1, initializer=heif_init) as pool:
        pool.map(sample_image_transform, image_bytes)

def main():
    heif_paths = [...]
    heif_images = [open(p, "rb").read() for p in heif_paths]
    try:
        test_register_heif_once(heif_images)
        print(f"test_register_heif_once() success")
    except Exception as e:
        print(f"test_register_heif_once() failed: {e}")
        pass

    try:
        test_register_heif_once_per_process(heif_images)
        print(f"test_register_heif_once_per_process() success")
    except Exception as e:
        print(f"test_register_heif_once_per_process() failed: {e}")
        pass

if __name__ == "__main__":
    mp.set_start_method("spawn", force=True)
    main()

Output:

test_register_heif_once() failed: cannot identify image file <_io.BytesIO object at 0x7f5dc3777d80>
test_register_heif_once_per_process() success

Describe your proposed solution

I'm not sure how you'd do this - open to discussion.

Describe alternatives you've considered, if relevant

If I explicitly call register_heif_opener() in the initializer of my mp pools or torch dataloaders, then there's no issue. But that's a bit easy to forget and causes difficult-to-debug errors.

I'm not sure how to persist imports after a spawn but I believe some libraries e.g. torch do it somehow.

Additional context

No response

bigcat88 commented 3 months ago

Good time of day. But you already provide a correct way how to do this with

def heif_init():
    from pillow_heif import register_heif_opener
    register_heif_opener()

with mp.Pool(1, initializer=heif_init) as pool:
    pool.map(sample_image_transform, image_bytes)

The same way it is done in FastAPI applications:


@asynccontextmanager
async def lifespan(app: FastAPI):  # code executes in each subprocess of webserver
    register_heif_opener()
    yield

Pillow itself requires for each subprocess to register plugins, it does not have automatic plugin registration for security reasons.

yit-b commented 3 months ago

Thanks for the quick response and clarification for the behavior. Will proceed with the per-subprocess initialization technique.

bigcat88 commented 3 months ago

You're welcome, always happy to help