X-rayLaser / pytorch-handwriting-synthesis-toolkit

Handwriting generation and handwriting synthesis as described in Alex Graves's paper https://arxiv.org/abs/1308.0850. Pytorch implementation.
MIT License
66 stars 11 forks source link

How to train it on other language's? #7

Open TUNA-NOPE opened 1 year ago

TUNA-NOPE commented 1 year ago

Hello πŸ‘‹ I would like to know if it possible to train a model on other languages like Hebrew, if u can help me with that I will be very happy 😊 THXπŸ™

X-rayLaser commented 1 year ago

Yes, I think it should be possible. Securing a large dataset continues to be the primary challenge. Your dataset must contain diverse and well-structured handwriting samples, presented as stroke sequences, not just pictures of written text. Follow the guidelines in the Readme section to train your own model using appropriate data. For details on the required data structure, consult the IAM Online Handwriting Database. Also, to acquire additional information about the precise representation of handwriting samples, please refer to the section titled "Implementing Custom Data Provider."

X-rayLaser commented 1 year ago

If you already possess data with the necessary representation, implementing a custom data provider class should suffice. Essentially, this class must have two methods - get_training_data and get_validation_data. These methods serve as Python generators, yielding (handwriting, transcription) pairs. You are free to design the implementation however you prefer. Once you complete your data provider class, you may employ readily available Python scripts to proceed with the remaining steps.