Dataset Preparation - Githubissues

awei669 / VQ-Font

[ICCV 2023] Few shot font generation via transferring similarity guided global and quantization local styles

https://arxiv.org/abs/2309.00827

128 stars 6 forks source link

Dataset Preparation #9

Closed iumemon5 closed 9 months ago

iumemon5 commented 9 months ago

I am confused about dataset preparation. In Data Preparation, you have mentioned structure as given in the image.

But in one of the issues, you have mentioned the following structure. Can you help me with this?

awei669 commented 9 months ago

Sorry for the confusion, the structure in the readme is correct, that is, in the first stage of VQ-VAE pre-training, only 3000 training characters in the content font are used for training, and the remaining 500 characters are used to test generalization. In the second stage of training font generation, there is no need to divide training characters and test characters in the training set and test set. They are divided by (train_unis and val_unis) when generating tran.json.

iumemon5 commented 9 months ago

So while preparing lmdb, what should be passed to --content_font. Images generated from train_unis.json or trian_val_all_characters.json

python3 build_meta4train.py \ --saving_dir ../results/chinese_dataset/ \ --content_font ../datasets/images/content_font/ \ --train_font_dir ../datasets/images/train \ --val_font_dir ../datasets/images/val \ --seen_unis_file ../meta/train_unis.json \ --unseen_unis_file ../meta/val_unis.json

awei669 commented 9 months ago

The second stage of making the lmdb content font directory should contain 3500 (train+val) characters, that is ../datasets/images/content_font/train_val/.

iumemon5 commented 9 months ago

I got it. Thank you so much for your time and prompt replies.

Djs-Champion commented 7 months ago

尊敬的作者你好，能否提供一下你的数据集呢？非常感谢

awei669 commented 7 months ago

@Djs-Champion 你好，由于版权的原因，我无法直接提供数据集。

请仔细阅读Readme当中的Data Preparation部分。lmdb的构建可参考 issue #6 。

Djs-Champion commented 7 months ago

你好，请问ipynb文件中的路径是写content里面的还是分别写train和val里面的

这是我的目录结构：

------------------ 原始邮件 ------------------ 发件人: "awei669/VQ-Font" @.>; 发送时间: 2024年3月11日(星期一) 晚上9:26 @.>; @.**@.>; 主题: Re: [awei669/VQ-Font] Dataset Preparation (Issue #9)

@Djs-Champion 你好，由于版权的原因，我无法直接提供数据集。

请仔细阅读Readme当中的Data Preparation部分。lmdb的构建可参考 issue #6 。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>