clovaai / fewshot-font-generation

The unified repository for few-shot font generation methods. This repository includes FUNIT (ICCV'19), DM-Font (ECCV'20), LF-Font (AAAI'21) and MX-Font (ICCV'21).
Other
211 stars 37 forks source link

Problems on the second stage of Chinese training #5

Closed zj916716524 closed 2 years ago

zj916716524 commented 2 years ago

Hello. Thank you very much for your help on the previous LF-FONT, I am now using this version of the code through your suggestion, the first stage of LF-FONT training achieved good results on Chinese, but the second stage of my training is very bad, the network will become worse and worse. From 10,000 times to 200,000 effects are getting worse and worse

200,000 times 第二次训练最后结果 10000 times 1万

8uos commented 2 years ago

Hi, we observed that long (> 50,000 iters) phase 2 training spoils the phase 1 trained model, especially when the emb_dim is large. We recommend reducing the value of emb_dim in this case. In our experiments, emb_dim=6 or 8 was proper to the Chinese dataset (> 3,000,000 images) but it was too large for the Korean dataset (< 150,000 images); so we used emb_dim=3 or 4 for the Korean dataset.

zj916716524 commented 2 years ago

I did not change the value of emb_dim and used the default of 8.So do I need to reduce his value?

8uos commented 2 years ago

It is right. How many fonts and characters do you use for the training?

zj916716524 commented 2 years ago

I used 57 ttf files

8uos commented 2 years ago

OK, it is much less than ours. I recommend reducing the emb_dim to 3 or 4. If still the same problem occurs, please tell us again.

zj916716524 commented 2 years ago

Can I use the PTH file from the first stage of training to make a direct prediction?

zj916716524 commented 2 years ago

好吧,比我们少很多。我建议减少emb_dim到 3 或 4。 如果仍然出现同样的问题,请再次告诉我们。

OK, does this need to be retrained from stage 1?

8uos commented 2 years ago

好吧,比我们少很多。我建议减少emb_dim到 3 或 4。 如果仍然出现同样的问题,请再次告诉我们。

OK, does this need to be retrained from stage 1?

Only phase 2 training is needed. It may not need a long time to train.

zj916716524 commented 2 years ago

好吧,比我们少很多。我建议减少emb_dim到3个或4个。如果仍然出现同样的问题,请再次告诉我们。

好的,这需要从第 1 阶段重新训练吗?

只需要第 2 阶段培训。它可能不需要很长时间来训练。

OK, I'll go and retry the second phase of training then. Thanks for your reply

zj916716524 commented 2 years ago

50000 Hi, I have adjusted emb_dim to 4 but the model becomes poor again at the 50,000th training. The problem is still not solved.

8uos commented 2 years ago

How about the 10,000 or 20,000 iteration results? You don't have to train the model to 50,000 iterations. It is okay to use the earlier step's model if it is better.

zj916716524 commented 2 years ago

Can I use the results of phase I to test? What should I do if I can?

8uos commented 2 years ago

I'm so sorry but it is quite difficult to use the results of phase 1 directly because the phase 1 model cannot generate the components (radicals) not in the reference characters, as written in our paper.

I am checking the code now and I will notify you when the code is updated. Sorry for the inconvenience.

zj916716524 commented 2 years ago

Thanks for your reply, I look forward to hearing from you after you update the code, much appreciated. I have used the first stage of training results for testing and the results are very poor, with a lot of missing parts. Another problem is that I don't know how to use the png format for testing, is there a more detailed description if I need to use PNG or JPG files for testing?

zj916716524 commented 2 years ago

Hi, just had a closer look at your code and found that you ran the EMD code. I am interested in this experiment but have been struggling to know the exact format of the dataset, I now want to test the performance of some EMD code and wondered if you could share the format of the dataset needed for the EMD experiment.

8uos commented 2 years ago

Sorry, I have closed this issue by my mistake.

8uos commented 2 years ago
  1. Sorry for the late response. I have checked the code but I could not find the reason... I doubt that it can be because of the pytorch version issue. I used pytorch 1.1 for the LF-Font training.

  2. To use image files for the test, this may help you. Also, I recommend to check the FTransGAN dataset format.

  3. Building EMD dataset was tricky... I cannot directly check the dataset because of disk issue, but I remember that I made a text file containing all the dataset like:

    {font1} {char1}
    {font1} {char2}
    ....

    Also, two dictionaries were needed:

    fc_dict = {"font1": [char1, char2, char3...], "font2":  [char1, char2, char3...]}
    cf_dict = {"char1": [font1, font2, font3..], "char2": [font1, font2, font3..]}

Then, I defined three processing functions for target image, style stack, character stack in model.py:

def process(input):
    input = input.decode("utf8")
    font = input[:-2]
    char = input[-1:]
    images = np.array(Image.open(f"{font}/{char}.png").convert("L").resize((80, 80))).astype(np.float32)
    return images/255.

def process_s(input):
    input = input.decode("utf8")
    font = input[:-2]
    char = input[-1:]
    chars_ = np.random.choice(sorted(set(fc_dict[key]) - {char}), a.style_sample_n, False)
    images = np.stack([np.array(Image.open(f"{font}/{char_}.png").convert("L").resize((80, 80)))
                    for char_ in chars_]).astype(np.float32)
    return images/255.

def process_c(input):
    input = input.decode("utf8")
    font = input[:-2]
    char = input[-1:]
    fonts_ = np.random.choice(sorted(set(cf_dict[uni]) - {font}), a.content_sample_n, False)
    images = np.stack([np.array(Image.open(f"{font_}/{char}.png").convert("L").resize((80, 80)))
                    for font_ in fonts_]).astype(np.float32)
    return images/255.

Finally, the relevant part of train.py was modified to:

Line 6-14

with open(os.path.join(a.input_dir,'target.txt'), 'r') as f:
        targets = f.readlines()
        targets = [line.strip() for line in targets]
    random.seed(a.seed)
    random.shuffle(targets)

    ####################### network #######################
    batch_inputsS_holder = tf.placeholder(tf.float32, [a.style_num*a.style_sample_n, 80, 80, 1],name='inputsS')
    batch_inputsC_holder = tf.placeholder(tf.float32, [a.content_num*a.content_sample_n, 80, 80, 1],name='inputsC')
    batch_targets_holder = tf.placeholder(tf.float32, [a.target_batch_size, 80, 80, 1],name='targets')

Line 33-51

    ####################### preparing data #######################
    targets_holder = tf.placeholder(tf.string)

    dataset1 = tf.data.Dataset.from_tensor_slices(targets_holder)
    dataset1 = dataset1.map(lambda x: tf.py_func(process_s, [x], [tf.float32]), num_parallel_calls=a.num_parallel_prefetch)
    dataset1 = dataset1.prefetch(a.style_num * a.num_parallel_prefetch)
    dataset1 = dataset1.batch(a.style_num).repeat(a.max_epochs)

    dataset2 = tf.data.Dataset.from_tensor_slices(targets_holder)
    dataset2 = dataset2.map(lambda x: tf.py_func(process_c, [x], [tf.float32]), num_parallel_calls=a.num_parallel_prefetch)
    dataset2 = dataset2.prefetch(a.content_num * a.num_parallel_prefetch)
    dataset2 = dataset2.batch(a.content_num).repeat(a.max_epochs)

    dataset3 = tf.data.Dataset.from_tensor_slices(targets_holder)
    dataset3 = dataset3.map(lambda x: tf.py_func(process, [x], [tf.float32]), num_parallel_calls=a.num_parallel_prefetch)
    dataset3 = dataset3.prefetch(a.target_batch_size * a.num_parallel_prefetch)
    dataset3 = dataset3.batch(a.target_batch_size).repeat(a.max_epochs)

Line 96-106

    sess.run(iterator1.initializer, feed_dict={targets_holder: targets})
    sess.run(iterator2.initializer, feed_dict={targets_holder: targets})
    sess.run(iterator3.initializer, feed_dict={targets_holder: targets})

    for step in range(max_steps):
        def should(freq):
            return freq > 0 and ((step + 1) % freq == 0 or step == max_steps - 1)

        batch_inputsS = sess.run(one_element1).reshape((-1, 80, 80, 1))
        batch_inputsC = sess.run(one_element2).reshape((-1, 80, 80, 1))
        batch_targets = sess.run(one_element3).reshape((-1, 80, 80, 1))

Note that, this is just for reference - it will not work as-is for your dataset. The Image loading part (Image.open(~~)) may need some modification. Good luck!

8uos commented 2 years ago

In this case, you need a text file with font-char list. The image files should be placed in this structure:

 |-- font1
        |-- char1.png
        |-- char2.png
        |-- char3.png
    |-- font2
        |-- char1.png
        |-- char2.png
zj916716524 commented 2 years ago
  1. 回复晚了非常抱歉。我检查了代码,但找不到原因......我怀疑这可能是因为 pytorch 版本问题。我使用 pytorch 1.1 LF-Font 训练。
  2. 要使用图像文件进行测试,这可能让您帮助进行数据集。另外,我建议检查FTransGAN格式。
  3. 制造 EMD 数据集很容易直接用于数据查询集,但我记得我制作了一个包含所有数据集的文本文件,例如:
{font1} {char1}
{font1} {char2}
....

此外,还需要注意:

fc_dict = {"font1": [char1, char2, char3...], "font2":  [char1, char2, char3...]}
cf_dict = {"char1": [font1, font2, font3..], "char2": [font1, font2, font3..]}

然后,我在model.py中定义了目标图片、样式栈、字符栈三个处理函数:

def process(input):
    input = input.decode("utf8")
    font = input[:-2]
    char = input[-1:]
    images = np.array(Image.open(f"{font}/{char}.png").convert("L").resize((80, 80))).astype(np.float32)
    return images/255.

def process_s(input):
    input = input.decode("utf8")
    font = input[:-2]
    char = input[-1:]
    chars_ = np.random.choice(sorted(set(fc_dict[key]) - {char}), a.style_sample_n, False)
    images = np.stack([np.array(Image.open(f"{font}/{char_}.png").convert("L").resize((80, 80)))
                  for char_ in chars_]).astype(np.float32)
    return images/255.

def process_c(input):
    input = input.decode("utf8")
    font = input[:-2]
    char = input[-1:]
    fonts_ = np.random.choice(sorted(set(cf_dict[uni]) - {font}), a.content_sample_n, False)
    images = np.stack([np.array(Image.open(f"{font_}/{char}.png").convert("L").resize((80, 80)))
                  for font_ in fonts_]).astype(np.float32)
    return images/255.

最后将train.py的相关部分修改为:

6-14行

with open(os.path.join(a.input_dir,'target.txt'), 'r') as f:
        targets = f.readlines()
        targets = [line.strip() for line in targets]
    random.seed(a.seed)
    random.shuffle(targets)

    ####################### network #######################
    batch_inputsS_holder = tf.placeholder(tf.float32, [a.style_num*a.style_sample_n, 80, 80, 1],name='inputsS')
    batch_inputsC_holder = tf.placeholder(tf.float32, [a.content_num*a.content_sample_n, 80, 80, 1],name='inputsC')
    batch_targets_holder = tf.placeholder(tf.float32, [a.target_batch_size, 80, 80, 1],name='targets')

33-51号线

    ####################### preparing data #######################
    targets_holder = tf.placeholder(tf.string)

    dataset1 = tf.data.Dataset.from_tensor_slices(targets_holder)
    dataset1 = dataset1.map(lambda x: tf.py_func(process_s, [x], [tf.float32]), num_parallel_calls=a.num_parallel_prefetch)
    dataset1 = dataset1.prefetch(a.style_num * a.num_parallel_prefetch)
    dataset1 = dataset1.batch(a.style_num).repeat(a.max_epochs)

    dataset2 = tf.data.Dataset.from_tensor_slices(targets_holder)
    dataset2 = dataset2.map(lambda x: tf.py_func(process_c, [x], [tf.float32]), num_parallel_calls=a.num_parallel_prefetch)
    dataset2 = dataset2.prefetch(a.content_num * a.num_parallel_prefetch)
    dataset2 = dataset2.batch(a.content_num).repeat(a.max_epochs)

    dataset3 = tf.data.Dataset.from_tensor_slices(targets_holder)
    dataset3 = dataset3.map(lambda x: tf.py_func(process, [x], [tf.float32]), num_parallel_calls=a.num_parallel_prefetch)
    dataset3 = dataset3.prefetch(a.target_batch_size * a.num_parallel_prefetch)
    dataset3 = dataset3.batch(a.target_batch_size).repeat(a.max_epochs)

96-106号线

    sess.run(iterator1.initializer, feed_dict={targets_holder: targets})
    sess.run(iterator2.initializer, feed_dict={targets_holder: targets})
    sess.run(iterator3.initializer, feed_dict={targets_holder: targets})

    for step in range(max_steps):
        def should(freq):
            return freq > 0 and ((step + 1) % freq == 0 or step == max_steps - 1)

        batch_inputsS = sess.run(one_element1).reshape((-1, 80, 80, 1))
        batch_inputsC = sess.run(one_element2).reshape((-1, 80, 80, 1))
        batch_targets = sess.run(one_element3).reshape((-1, 80, 80, 1))

请注意,这句歌词修改了您的一些数据参考集。用于加载图片的一些部分。 打开(~~)可能。 祝你图片好运!

Thank you for your kind reply. I will revise the lf-font code and remake the EMD data set according to your suggestions