how to split train and test dataset

ecnuycxie / DG-Font

The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

220 stars 46 forks source link

how to split train and test dataset #16

Open ZYJ-JMF opened 3 years ago

ZYJ-JMF commented 3 years ago

Hi,I am wondering how to split the train and test datasets? In your paper,it said that "The dataset is randomly partitioned into a training set and testing set". Dose randomly mean the Font content test set in differents Font style is different?

ecnuycxie commented 3 years ago

Thank you for your question. In order to ensure that the characters in the test set are invisible during the training process, the content in the test set of each style is the same.

ZYJ-JMF commented 3 years ago

Thanks for your answer. I have another question，how to evaluation the performance of the result and get the quantitative score? In the validataion, every font(total 8) can generate a series of styled fonts.Which is used for the eval? or you caculate the average score? Thanks

ecnuycxie commented 3 years ago

I used the font KAI as the content during the testing process. You can also use other fonts as content, such as SONG

ZYJ-JMF commented 3 years ago

oh,Thanks for your answer. I want to make sure whether I understand your answer. DO you mean that you use other style font as the content to generate the KAI style and compare with the groundtruth font KAI?

ecnuycxie commented 3 years ago

In the test process, we use KAI as the source font (content) to generate other styles and compare these generated images with their ground-truth.

ZYJ-JMF commented 3 years ago

"In the test process, we use KAI as the source font (content) to generate other styles and compare these generated images with their ground-truth." " other styles" here are choosed from the total 990 word or from the valid 190 words as the reference style? Do you random choose 50 words as the reference and generated images? Will you share the eval code? Thanks

dagongji10 commented 3 years ago

@ecnuycxie I also have some questions about the split of dataset.

The Paper said: training set contains 400 fonts and each font contains 800 characters, testing set contains remaining 190 characters of the 400 fonts and remaining 10 fonts.
But in the code: training set contains 410 fonts and each font contains 800 characters, testing set contains remaining 190 characters of the 410 fonts.

Am I right? How to understand this difference? Does that make any difference for DG-Font?

Another question, validation seems only save generated images, no useful information generated for training. So can I comment out this line for speeding up training？