Closed Zhazhan closed 2 years ago
2022-12-21 EDIT:
You can now use Vimeo90K for image compression training without any further processing. Modify the training script as follows:
from compressai.datasets import Vimeo90kDataset
train_dataset = Vimeo90kDataset(
args.dataset, split="train", transform=train_transforms
)
test_dataset = Vimeo90kDataset(
args.dataset, split="valid", transform=test_transforms
)
Previous preprocessing technique:
Not sure how the maintainers preprocessed their dataset, but here's my methodology:
wget http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip
unzip vimeo_triplet.zip
python process_vimeo_triplet.py
where:
# process_vimeo_triplet.py
import os
import shutil
from pathlib import Path
def extract_dataset_split(in_dir: str, out_dir: str, list_filename: str):
with open(list_filename) as f:
lines = f.read().splitlines()
os.makedirs(out_dir, exist_ok=True)
in_dir_path = Path(in_dir)
out_dir_path = Path(out_dir)
for subdir in lines:
if subdir == "":
continue
subdir_path = in_dir_path / subdir
out_prefix = str(out_dir_path / (subdir.replace("/", "_") + "_"))
in_images = os.listdir(subdir_path)
for image in in_images:
src = subdir_path / image
dst = out_prefix + image
print(f"{src} -> {dst}", subdir)
shutil.copy2(src, dst)
extract_dataset_split(
"vimeo_triplet/sequences",
"vimeo90k_compressai/train",
"vimeo_triplet/tri_trainlist.txt",
)
extract_dataset_split(
"vimeo_triplet/sequences",
"vimeo90k_compressai/test",
"vimeo_triplet/tri_testlist.txt",
)
which copies files as follows:
vimeo_triplet/sequences/00001/0001/im1.png -> vimeo90k_compressai/train/00001_0001_im1.png
vimeo_triplet/sequences/00001/0001/im2.png -> vimeo90k_compressai/train/00001_0001_im2.png
vimeo_triplet/sequences/00001/0001/im3.png -> vimeo90k_compressai/train/00001_0001_im3.png
vimeo_triplet/sequences/00001/0002/im1.png -> vimeo90k_compressai/train/00001_0002_im1.png
...
with the resulting directory tree:
vimeo90k_compressai/
train/
00001_0001_im1.png
00001_0001_im2.png
00001_0001_im3.png
00001_0002_im1.png
...
test/
...
Note that the "test" directory actually contains the validation set since testing is done via the RD curves on the Kodak test set (768x512 sized images).
Thank you for kindly sharing! I'll try it out.
Hi sorry for the late reply. The above solution works perfectly. Another way that can speed up training is to generate npy files containing the train/validation sets. Please note that train.py is a "simple" exemplary training loop, this would require to tweak the exemplary dataloader.
Thank you for your outstanding and constructive contributions!
Recently I have been trying to use CompressAI to implement a custom network and compare its performance with other methods that have been implemented in CompressAI. However, I found that the Vimeo90K dataset had three different subsets and I was confused about which one to download. Also, I did not find further description about how to preprocess the Vimeo90K dataset after reading both the documentation and paper. Could you please tell me which subset to download, and how to preprocess it? Or, instead, does the dataset used for training have little effect on the final performance?
Thanks again for your excellent work. Looking forward to your reply.