TencentARC / Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Apache License 2.0
270 stars 7 forks source link

[question] any plans to train higher compression ratio? #5

Closed eisneim closed 2 weeks ago

eisneim commented 2 weeks ago

Great thanks to the authors of this project!

Bytedance's TiTok use 1d codebook achieves impressive 256x256 to 32 token super high compression ratio, this is very useful for long video multi modal understanding task

do you have any plan to train a higher compression ratio magvit2? eg. 256x256 -> 8x8 this might cause rFID to go up and not usable for image generation, but this tokenizer would be very use for for multi-modal LLMs

thanks!

yxgeee commented 2 weeks ago

Hi @eisneim ,

Thank you for the suggestion. Yes, as you said, while using more tokens can lead to better reconstruction, it also introduces additional challenges for Transformers to learn. There should be a balance between the number of tokens and the final results.

We will consider expanding the Open-MAGVIT2 tokenizer family with scaled-up training data, larger backbones, higher compression ratios, and so on. However, due to numerous tasks in progress (e.g., finalizing autoregressive training), training a tokenizer with higher compression ratios is not currently a high priority. In fact, the training can be initiated by simply modifying the config files. You are also welcome to contribute to our repository if you decide to try it in the future.

eisneim commented 2 weeks ago

thank you @yxgeee i'll try to train higher compression ratio on imagenet but with just 1 RTX 4090 it might take a long time

eisneim commented 1 week ago

@yxgeee here is what i changed:


from typing import Optional, Union, Callable, Tuple, Any
import os
from pathlib import Path
from omegaconf import OmegaConf

import torch
from torchvision import transforms as T
from torch.utils.data import Dataset
from taming.data.base import ImagePaths

def parse_dir(root_dir):
    items = []
    for root, dirs, files in os.walk(root_dir):
        for file in files:
            if (file.lower().endswith('.jpg') or file.lower().endswith('.jpeg') or file.lower().endswith('.png')) and file[0] != ".":
                items.append(os.path.join(root, file))
    return items

class CustomdirBase(Dataset):
    def __init__(self):
        self.data = []

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):
        return self.data[i]

class CustomdirTrain(CustomdirBase):
    def __init__(self, root: str,
                 size: Union[Tuple[int, int], int] = 256) -> None:

        abspaths = parse_dir(root)
        print("----> train images", root, len(abspaths))
        self.data = ImagePaths(abspaths,
                               labels=None,
                               size=size,
                               random_crop=True)

class CustomdirValidation(CustomdirBase):
    def __init__(self, root: str,
                 size: Union[Tuple[int, int], int] = 256) -> None:

        abspaths = parse_dir(root)
        print("----> validate images", root, len(abspaths))
        self.data = ImagePaths(abspaths,
                               labels=None,
                               size=size,
                               random_crop=True)

class CustomdirTest(CustomdirBase):
    def __init__(self, root: str,
                 size: Union[Tuple[int, int], int] = 256) -> None:

        abspaths = parse_dir(root)
        print("----> test images", root, len(abspaths))
        self.data = ImagePaths(abspaths,
                               labels=None,
                               size=size,
                               random_crop=True)

截图 2024-06-20 11-14-13