Grzego / handwriting-generation

Implementation of handwriting generation with use of recurrent neural networks in tensorflow. Based on Alex Graves paper (https://arxiv.org/abs/1308.0850).
MIT License
520 stars 107 forks source link

Training numbers #2

Open impactcolor opened 6 years ago

impactcolor commented 6 years ago

This is probably outside the scope of the "issues" but figure I'd ask.
I notice it doesn't take numbers. Is there away to add numbers to the xml data sets so it can also do numbers?

Grzego commented 6 years ago

You should be able to generate numbers like:

python generate.py --text="1 2 3 4 5 " --noinfo --bias=4.

although the quality will probably be quite bad (too little examples in dataset).

You can add your own examples in .xml format but you will have to match them to those already in dataset (content should contain tags like: <Transcription>, <Text> and <StrokeSet>, structured like in dataset).

Alternatively if you have data with consecutive points representing how to draw numbers (with labels) you could create your own dataset.

So depending on format of your dataset it might be easier or harder. :)

impactcolor commented 6 years ago

I'm really new to this so I'm not sure how to go about creating a dataset. Do you have any articles or direction you can point me to?

Grzego commented 6 years ago

Sorry for the delay. I get the feeling you have no data, which is problematic. Could you please elaborate a little bit more on what you are trying to achieve? :)

impactcolor commented 6 years ago

It's no problem, thank you for taking the time to even discuss this with me. I found a dataset which of numerically written numbers however it isn't setup as the current dataset used by IAM in xml files. What I'm trying to accomplish is to use the handwriting but it also has to include numbers and currently the numbers do not come out good.

On Fri, Oct 20, 2017 at 6:06 AM, Grzegorz Opoka notifications@github.com wrote:

Sorry for the delay. I get the feeling you have no data, which is problematic. Could you please elaborate a little bit more on what you are trying to achieve? :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-338200995, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOknAGNyvv2VlG7lkOJuE9BNydaJKOks5suJrygaJpZM4P-NV6 .

Grzego commented 6 years ago

Ok, is this dataset publicly available? I can look into it to see if there is a way to make it compatible with my code. :)

impactcolor commented 6 years ago

Awesome! Here goes:

http://yann.lecun.com/exdb/mnist/

http://archive.ics.uci.edu/ml/machine-learning-databases/semeion/

I found these two

Sent from my iPhone

On Oct 21, 2017, at 3:05 AM, Grzegorz Opoka notifications@github.com wrote:

Ok, is this dataset publicly available? I can look into it to see if there is a way to make it compatible with my code. :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Grzego commented 6 years ago

Unfortunatelly, those datasets represent numbers as images. For handwriting generation you would need to have list of consecutive points showing how a digit is written. So those datasets cannot be used here.

impactcolor commented 6 years ago

Would this one work? This has the stroke data: https://github.com/edwin-de-jong/mnist-digits-stroke-sequence-data/wiki/MNIST-digits-stroke-sequence-data

On Mon, Oct 23, 2017 at 2:36 PM, Grzegorz Opoka notifications@github.com wrote:

Unfortunatelly, those datasets represent numbers as images. For handwriting generation you would need to have list of consecutive points showing how a digit is written. So those datasets cannot be used here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-338804235, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOkpsMBSx4SjLVJftQ-gStOB7Yv2ZYks5svQb3gaJpZM4P-NV6 .

Grzego commented 6 years ago

This one might work. :) Can you give some examples of sequences you want to generate? I just want to figure out what kind of augmentation to dataset might be needed.

impactcolor commented 6 years ago

about 5 digit random sequences. In example 11445 8013 1507 etc..

On Mon, Oct 23, 2017 at 4:30 PM, Grzegorz Opoka notifications@github.com wrote:

This one might work. :) Can you give some examples of sequences you want to generate? I just want to figure out what kind of augmentation to dataset might be needed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-338826058, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOkiB0tXseZLgH7Nry79NSXJcXQchlks5svSGRgaJpZM4P-NV6 .

Grzego commented 6 years ago

Sorry for very late response. I tried this dataset and unfortunately it doesn't work well :/ The results are even worse than with original IAM dataset. If by any chance I find better dataset for this task I will post it here.

impactcolor commented 6 years ago

THANK YOU!!!!

On Wed, Nov 8, 2017 at 12:50 PM, Grzegorz Opoka notifications@github.com wrote:

Sorry for very late response. I tried this dataset and unfortunately it doesn't work well :/ The results are even worse than with original IAM dataset. If by any chance I find better dataset for this task I will post it here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Grzego/handwriting-generation/issues/2#issuecomment-342955118, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQOkiSt828fSdSpFVqBdRCh93u3PkbCks5s0hQkgaJpZM4P-NV6 .

Grzego commented 6 years ago

Well it's been a while, but I was kind of interested in this problem and created MNIST handwriting dataset. If you still need to generate numbers you may find it useful. One simple solution is to just pick needed digits from this dataset and concatenate them together. :)

impactcolor commented 6 years ago

@Grzego THANK YOU!