chaiNNer-org / spandrel

Spandrel gives your project support for various PyTorch architectures meant for AI Super-Resolution, restoration, and inpainting. Based on the model support implemented in chaiNNer.
MIT License
105 stars 7 forks source link

Is it possible to do x2 SR using an x4 model? #143

Closed pietrobolcato closed 5 months ago

pietrobolcato commented 5 months ago

Hey there :) first of all thanks for the amazing work! is impressive and super useful πŸ™

I am currently using this model:

image

Which says it's possible to use it also x2, and x4 - by using the following code, I can upscale x4. How can I do it x2 instead?

def image_to_tensor(image_path: str) -> torch.Tensor:
    """Converts an image to a tensor.

    Parameters
    ----------
    image_path : str
        Path to the image.

    Returns
    -------
    torch.Tensor
        The image as a tensor.
    """
    # read image
    image = Image.open(image_path).convert("RGB")
    image = np.array(image)

    # convert to tensor
    image = image.astype(np.float32) / 255.0
    if image.ndim == 2:
        image = np.expand_dims(image, axis=2)
    if image.shape[2] == 1:
        pass
    else:
        image = np.transpose(image, (2, 0, 1))

    tensor = torch.from_numpy(image)
    return tensor.unsqueeze(0)

def tensor_to_image(tensor: torch.Tensor, image_path: str) -> None:
    """Converts a tensor to an image and saves it to disk.

    Parameters
    ----------
    tensor : torch.Tensor
        The tensor to convert.
    image_path : str
        Path to save the image to.
    """
    image = tensor.cpu().squeeze().numpy()
    image = np.transpose(image, (1, 2, 0))
    image = np.clip((image * 255.0).round(), 0, 255)
    image = image.astype(np.uint8)

    image = Image.fromarray(image)
    image.save(image_path)

def inference(
    model: ImageModelDescriptor, input_image_path: str, output_image_path: str
) -> None:
    """Performs inference on an input image, and saves the result to an output
    one.

    Parameters
    ----------
    model : ImageModelDescriptor
        The model to use for inference.
    input_image_path : str
        Path to the input image.
    output_image_path : str
        Path to save the output image to.
    """
    # perform inference
    input_tensor = image_to_tensor(input_image_path).to("cuda")
    with torch.no_grad():
        result = model(input_tensor)

    # save result
    tensor_to_image(result, image_path=output_image_path)

def load_model(model_path: str) -> ImageModelDescriptor:
    """Loads a model from disk.

    Parameters
    ----------
    model_path : str
        Path to the model.

    Returns
    -------
    ImageModelDescriptor
        The loaded model.
    """
    model = ModelLoader().load_from_file(model_path)
    model.to("cuda")
    model.eval()

    return model

Thanks so much in advance! greatly appreciated πŸ™

joeyballentine commented 5 months ago

It says that because in their original code, they have a scale option that downscales the results after processing with the model.

So, the way to get 2x out of a 4x model is just to downscale it after.

For context, models have fixed scales internally, and you can't just arbitrarily change them. Any scale setting you see in another application is doing some combination of upscaling and downscaling internally to act like it's changing the model's scale, when it really isn't.

pietrobolcato commented 5 months ago

Super helpful! had the same doubt actually, thanks so much :) appreciate a lot!