This is a pytorch implementation of the paper "ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation"
conda env create --name pytorch1.2 --file=environmentPytorch12.yml
source activate pytorch1.2
visdom -port 8192
python train.py --name_prefix demo --dataname RIMEScharH32W16 --capitalize --display_port 8192
--name
: unless specified in the arguments, the experiment name is determined by the name_prefix, the dataset and parameters different from the default ones (see code in options/base_options.py
).--name_prefix
: the prefix to the automatically generated experiment name.--dataname
: name of dataset which will determine the dataroot path according to data/dataset_catalog.py--lex
: the lexicon used to generate the fake images. There is a default lexicon for english/french data specified in options/base_options.py
. --capitalize
: randomly capitalize first letters of words in the lexicon used.--display_port
: visdom display port--checkpoints_dir
: the networks weights and sample images are saved to checkpoints_dir/experiment_name
.--use_rnn
: whether to use LSTM--seed
: determine the seed for numpy and pytorch instead of using a random one.--gb_alpha
: the balance between the recognizer and discriminator loss. Higher alpha means larger weight for the recognizer.options/base_options.py
and options/train_options.py
.python train_semi_supervised.py --dataname IAMcharH32W16rmPunct --unlabeled_dataname CVLtrH32 --disjoint
Main arguments:
--dataname
: name of dataset which will determine the labeled dataroot path according to data/dataset_catalog.py. This data is used to train only the Recognizer (in the disjoint case) or the Recognizer and the Discriminator networks.--unlabeled_dataname
: name of dataset which will determine the unlabeled dataroot path according to data/dataset_catalog.py. This data is used to train only the Discriminator network.--disjoint
: Disjoint training of the discriminator and the recognizer (each sees only the unlabeled/labeled data accordingly).Other arguments are explained in the file options/base_options.py
and options/train_options.py
.
Before generating an LMDB download the desired dataset into Datasets
:
The structure of the directories should be:
To generate an LMDB file of one of the datasets CVL/IAM/RIMES/GW for training use the code:
cd data
python create_text_data.py
create_Dict = False
: create a dictionary of the generated datasetdataset = 'IAM'
: CVL/IAM/RIMES/gwmode = 'va2'
: tr/te/va1/va2/alllabeled = True
: save the labels of the images or not.top_dir = 'Datasets'
: The directory containing the folders with the different datasets.words = False
: parameter relevant for IAM/RIMES. Use words images, otherwise use linesoffline = True
: use offline imagesauthor_number = -1
: use only images of a specific writer. If the value is -1, use all writers, otherwise use the index of this specific writerremove_punc = True
: remove images which include only one punctuation mark from the list ['.', '', ',', '"', "'", '(', ')', ':', ';', '!']resize='noResize'
: charResize|keepRatio|noResize - type of resize, char - resize so that each character's width will be in a specific range (inside this range the width will be chosen randomly), keepRatio - resize to a specific image height while keeping the height-width aspect-ratio the same. noResize - do not resize the imageimgH = 32
: height of the resized imageinit_gap = 0
: insert a gap before the beginning of the text with this number of pixelscharmaxW = 18
: The maximum character widthcharminW = 10
: The minimum character widthh_gap = 0
: Insert a gap below and above the textdiscard_wide = True
: Discard images which have a character width 3 times larger than the maximum allowed character size (instead of resizing them) - this helps discard outlier imagesdiscard_narr = True
: Discard images which have a character width 3 times smaller than the minimum allowed charcter size.The generated lmdb will be saved in the relevant dataset folder and the dictionary with be saved in Lexicon folder.
python generate_wordsLMDB.py --dataname IAMcharH32rmPunct --results_dir ./lmdb_files/IAM_concat --n_synth 100,200 --name model_name
--dataname
: name of dataset which will determine the dataroot path according to data/dataset_catalog.py. note that will be concatenated to the generated image.--no_concat_dataset
: ignore “dataname” (previous parameter), do not concatenate--results_dir
: path to result, will be concatenated with "n_synth"--n_synth
: number of examples to generate in thousands--name
: name of model used to generate the images--lex
: lexicon used to generate the images The structure of the code is based on the structure of the CycleGAN code.
If you use this code for your research, please cite our paper.
@inproceedings{fogel2020scrabblegan,
title={ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation},
author={Sharon Fogel and Hadar Averbuch-Elor and Sarel Cohen and Shai Mazor and Roee Litman},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
ScrabbleGAN is released under the MIT license. See the LICENSE and THIRD-PARTY-NOTICES.txt files for more information.
Your contributions are welcome!
See CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.