Optimize image size - Githubissues

nbeloglazov commented 11 months ago

For books and people we upload 500x500px images. But in reality we don't use 500x500 at the moment. On the book/person page we use 275x275px. On the catalog page we use 150x150px: But we still fetch the original 500x500px images wasting user bandwidth and slowing down the page. For example index page fetches 4.5MB of resources. Which is kinda a lot.

Few ways we could address it:

Support on-the fly resizing. For example allow requesting images with ?size=275 param and server resizes them on the fly and returns. The problem is that it defeats the benefits we get today from using google's CDN as images served as static resources.
Produce multiple versions of the image at upload time. Downsides is that we still need to implement some job/script that can produce variations of all existing images. For example when we have a usecase for a new size.
Create a periodic job that "syncs" all images. The job will produce variations of images if they don't exist. We could run the job regularly (hourly) or trigger after book/person creation, if django supports it.

nbeloglazov commented 11 months ago

@frombrest do you have thoughts/preferences? Or maybe better ideas.

frombrest commented 11 months ago

@frombrest do you have thoughts/preferences? Or maybe better ideas.

I personally like third option... We can create adjustable script with list of required final formats.

I am not familiar with current underlying storage mechanism, and have not experience with google cloud... but since most of them are the same I feel that GCP should have something similar to Triggers in AWS S3. Of course if you use GCS (google analog of AWS S3 service)... Once you store object (image in our case) to the bucket it can trigger serverless function to perform resizing logic and create copies of images which will be automatically propagated to CDN.

Or we can use: https://docs.djangoproject.com/en/4.2/topics/signals/ https://docs.djangoproject.com/en/4.2/ref/signals/#post-save to run resizing logic right after creation or update object and additionally by cronjob once a day to make sure all required images are existed.

nbeloglazov commented 11 months ago

I am not familiar with current underlying storage mechanism, and have not experience with google cloud...

Yes, apparently it's using GCS to store all images. If there is a simple function that can be implemented on GCS level - then I think it's good enough.

Though we need to think how to perform a fallback. If we don't create resized variants immediately upon saving - then html rendering has to know that image doesn't have resized version and use original image url. We could check for each image whether its resized version exists, and if so - use resized. But I don't know how fast these checks will work. We use django-storages library that provides GCS support. But I suspect that it's not very fast and likely each check such as "exists" goes through an API call.

So maybe it's better to ensure that we resize images on save.

frombrest commented 11 months ago

Though we need to think how to perform a fallback.

By default backend might use raw image. When you store image, function should make resized copies and hit internal endpoint with providing original file name to notify backend that it copies are ready to use. Or notify you that something went wrong.

Example:

from django.db import models

class Book(models.Model):
    image = models.ImageField(...
    image_resized = models.BooleanField(default=False)

    @property
    def image_url(self):
        if image_resized:
            return self.image.url + "9x13.jpg" #add custom suffix to the end of filename
        return self.image.url

By adding computed filed in the model you do not need to care about checking logic in templates.

Additionally we need to add internal endpoint to modify image_resized field once resize function successfully created copies.

nbeloglazov commented 11 months ago

Can we avoid adding a new field to the model to avoid complexity? If we add a field we'll need to make sure that it actually matches whether resized file exists or not. For example when we upload a new image we need to be careful about clearing the field. Mentally I treat resized images as temporary files that can be removed and the site should still function (after restart).

What if we kept the list of resized files in a cache? Django provides in-memory cache. Upon server startup we can get list of resized images from storage (there is a listdir) method for that. And update/repopulate cache upon model changes.

It's similar to your proposal but the field is transient/in-cache rather than stored in DB.

nbeloglazov commented 8 months ago

This is mostly done. Already deployed and working in production. What's left is to write a test that verifies, that when a scaled down image is available - it is served instead of the original full size image.

It would be nice to write a full test that includes modifying image in bucket, triggering cloud function and getting push event on the site. But that will be overkill with our current setup.

nbeloglazov commented 8 months ago

Another future optimization is to convert images to webp. Given that we have pipeline now it won't be much work to extend it to produce webp. The only thing is that we need to detect whether browser supports webp.

frombrest commented 8 months ago

The only thing is that we need to detect whether browser supports webp.

def book_detail(request: HttpRequest, slug: str) -> HttpResponse:
    # ...
    context = {
        # ...
        'user_agent': request.META.get('HTTP_USER_AGENT', '')
        # ...
    }
    return render(request, 'books/book-detail.html', context)

def get_image_for_size(filename: str, size: int, user_agent: str) -> str:
    # ...
    format = '.webp' if userAgentSupportWebP(user_agent) else '.jpg'
    # ...
    if size not in sizes[filename+format]:
        logging.warning(f'Image {filename}{format} missing size {size}.')
        return filename
    return sizes[filename+format][size]

def userAgentSupportWebP(user_agent) -> bool:
    # parse and check browser and version:
    # https://caniuse.com/webp
    return True

something like that?

belaudiobooks / website

Optimize image size #108