DrakeHooks commented 7 months ago

I was able to get a model trained on 800 prompts. I will likely create more prompts to fine-tune the model, but I am not sure why the characters are not in the correct order. I am not using any styles for this result but that might be the issue. I understand this is more of problem with handwriting-synthesis, but I was wondering if you encountered this issue as well. I used that one script for creating a custom style, but it only does the strokes. This is the output I got for all star lyrics. Thanks for all the help again.

allstar

acmattson3 commented 7 months ago

I was able to get a model trained on 800 prompts. I will likely create more prompts to fine-tune the model, but I am not sure why the characters are not in the correct order. I am not using any styles for this result but that might be the issue. I understand this is more of problem with handwriting-synthesis, but I was wondering if you encountered this issue as well. I used that one script for creating a custom style, but it only does the strokes. This is the output I got for all star lyrics. Thanks for all the help again.

It looks like it might be an over fitting issue. When you're training your model, do you notice the training error and validation error amounts diverging a lot? I never got this far; I only used ~100 prompts at most. I'm very glad you're finding my program to be useful!

DrakeHooks commented 7 months ago

I was able to get a model trained on 800 prompts. I will likely create more prompts to fine-tune the model, but I am not sure why the characters are not in the correct order. I am not using any styles for this result but that might be the issue. I understand this is more of problem with handwriting-synthesis, but I was wondering if you encountered this issue as well. I used that one script for creating a custom style, but it only does the strokes. This is the output I got for all star lyrics. Thanks for all the help again.

It looks like it might be an over fitting issue. When you're training your model, do you notice the training error and validation error amounts diverging a lot? I never got this far; I only used ~100 prompts at most. I'm very glad you're finding my program to be useful!

Yeah I feel like this is an error in the training. It starts with a loss of 4.0 and is at a loss of 0 at around step 380. It only takes a couple hours to train on CPU. I feel like it should take longer? Also, by the time it saves a checkpoint, the loss is below 0.

acmattson3 commented 7 months ago

Yeah I feel like this is an error in the training. It starts with a loss of 4.0 and is at a loss of 0 at around step 380. It only takes a couple hours to train on CPU. I feel like it should take longer?

I'm definitely no expert on the model itself, but some tweaking definitely needs to be done. In my experience training a model, I was able to get negative loss values (which, as it turns out, is normal). Have you had that happen? You can mess with the model's settings, specifically:

nn = rnn(
        reader=dr,
        log_dir='logs',
        checkpoint_dir='checkpoints',
        prediction_dir='predictions',
        #learning_rates=[.0001, .00005, .00002], # This was the original set of learning rates.
        learning_rates=[.001, .0005, .0002], # These learning rates are WAY faster.
        #batch_sizes=[32, 64, 64], # Much larger batch sizes means less chance of overfitting, but:
        batch_sizes=[8, 16, 16], # Smaller batch sizes means it is easier to train on less data.
        #patiences=[1500, 1000, 500], # Larger patience values lets the model train for longer even with prolonged increases in loss.
        patiences=[1000, 500, 250],
        beta1_decays=[.9, .9, .9],
        validation_batch_size=32, # THIS might have large impact on overfitting. Larger value means more extensive model performance checking.
        optimizer='rms',
        #num_training_steps=100000, # Larger max training steps lets the model train for longer (and maybe that's better)
        num_training_steps=2500, # This is a VERY small value. 
        warm_start_init_step=0,
        regularization_constant=0.0,
        keep_prob=1.0,
        enable_parameter_averaging=False,
        #min_steps_to_checkpoint=2000, # A larger value ensures poor-performing, early models are not in the running.
        min_steps_to_checkpoint=1000,
        log_interval=20,
        grad_clip=10,
        lstm_size=400,
        output_mixture_components=20,
        attention_mixture_components=10
    )

Hope this gives some good pointers as to where to start, but like I said, I never got this far with my own training.

DrakeHooks commented 7 months ago

Yeah I feel like this is an error in the training. It starts with a loss of 4.0 and is at a loss of 0 at around step 380. It only takes a couple hours to train on CPU. I feel like it should take longer?

I'm definitely no expert on the model itself, but some tweaking definitely needs to be done. In my experience training a model, I was able to get negative loss values (which, as it turns out, is normal). Have you had that happen? You can mess with the model's settings, specifically:


nn = rnn(

        reader=dr,

        log_dir='logs',

        checkpoint_dir='checkpoints',

        prediction_dir='predictions',

        #learning_rates=[.0001, .00005, .00002], # This was the original set of learning rates.

        learning_rates=[.001, .0005, .0002], # These learning rates are WAY faster.

        #batch_sizes=[32, 64, 64], # Much larger batch sizes means less chance of overfitting, but:

        batch_sizes=[8, 16, 16], # Smaller batch sizes means it is easier to train on less data.

        #patiences=[1500, 1000, 500], # Larger patience values lets the model train for longer even with prolonged increases in loss.

        patiences=[1000, 500, 250],

        beta1_decays=[.9, .9, .9],

        validation_batch_size=32, # THIS might have large impact on overfitting. Larger value means more extensive model performance checking.

        optimizer='rms',

        #num_training_steps=100000, # Larger max training steps lets the model train for longer (and maybe that's better)

        num_training_steps=2500, # This is a VERY small value. 

        warm_start_init_step=0,

        regularization_constant=0.0,

        keep_prob=1.0,

        enable_parameter_averaging=False,

        #min_steps_to_checkpoint=2000, # A larger value ensures poor-performing, early models are not in the running.

        min_steps_to_checkpoint=1000,

        log_interval=20,

        grad_clip=10,

        lstm_size=400,

        output_mixture_components=20,

        attention_mixture_components=10

    )

Hope this gives some good pointers as to where to start, but like I said, I never got this far with my own training.

Yeah I'm no expert on the training either 💀 I just read up on overfitting and that sounds about right. The loss values are negative by the time it checkpoints (around -.50) But like you said, that is normal. I'll mess around with the model settings and see if there is a fix in that. I feel like it should be training slower so maybe i need to set higher # of epochs?

DrakeHooks commented 7 months ago

@acmattson3 Good news! Changing those values seems to produce a better model. Over the next few days, I will add more prompt data and train with a slower learning rate. Also, the validation_batch_size seems to be one of the more important values to modify to avoid overfitting like you said. These are the values I trained with.
nn = rnn( reader=dr, log_dir='logs', checkpoint_dir='checkpoints', prediction_dir='predictions',

learning_rates=[.0001, .00005, .00002],

    learning_rates=[.001, .0005, .0002],
    batch_sizes=[32, 64, 64],
    # batch_sizes=[8, 16, 16],
    patiences=[1500, 1000, 500],
    # patiences=[1000, 500, 250],
    beta1_decays=[.9, .9, .9],
    validation_batch_size=64,
    optimizer='rms',
    num_training_steps=100000,
    # num_training_steps=2500,
    warm_start_init_step=0,
    regularization_constant=0.0,
    keep_prob=1.0,
    enable_parameter_averaging=False,
    min_steps_to_checkpoint=360,
    # min_steps_to_checkpoint=380,
    log_interval=20,
    grad_clip=10,
    lstm_size=400,
    output_mixture_components=20,
    attention_mixture_components=10
)
nn.fit()

allstar2

acmattson3 commented 7 months ago

@acmattson3 Good news! Changing those values seems to produce a better model. Over the next few days, I will add more prompt data and train with a slower learning rate. Also, the validation_batch_size seems to be one of the more important values to modify to avoid overfitting like you said. These are the values I trained with.
nn = rnn( reader=dr, log_dir='logs', checkpoint_dir='checkpoints', prediction_dir='predictions',

learning_rates=[.0001, .00005, .00002],
    learning_rates=[.001, .0005, .0002],
    batch_sizes=[32, 64, 64],
    # batch_sizes=[8, 16, 16],
    patiences=[1500, 1000, 500],
    # patiences=[1000, 500, 250],
    beta1_decays=[.9, .9, .9],
    validation_batch_size=64,
    optimizer='rms',
    num_training_steps=100000,
    # num_training_steps=2500,
    warm_start_init_step=0,
    regularization_constant=0.0,
    keep_prob=1.0,
    enable_parameter_averaging=False,
    min_steps_to_checkpoint=360,
    # min_steps_to_checkpoint=380,
    log_interval=20,
    grad_clip=10,
    lstm_size=400,
    output_mixture_components=20,
    attention_mixture_components=10
)
nn.fit()

Those are much improved results! Keep me updated on your progress as you add more data and tweak your values. It'll be interesting to see how your changes affect the model.

techno-yogi commented 3 months ago

Also interested in this, any update?

ImNotOssy commented 2 months ago

@acmattson3 Good news! Changing those values seems to produce a better model. Over the next few days, I will add more prompt data and train with a slower learning rate. Also, the validation_batch_size seems to be one of the more important values to modify to avoid overfitting like you said. These are the values I trained with. nn = rnn( reader=dr, log_dir='logs', checkpoint_dir='checkpoints', prediction_dir='predictions', # learning_rates=[.0001, .00005, .00002], learning_rates=[.001, .0005, .0002], batch_sizes=[32, 64, 64], # batch_sizes=[8, 16, 16], patiences=[1500, 1000, 500], # patiences=[1000, 500, 250], beta1_decays=[.9, .9, .9], validation_batch_size=64, optimizer='rms', num_training_steps=100000, # num_training_steps=2500, warm_start_init_step=0, regularization_constant=0.0, keep_prob=1.0, enable_parameter_averaging=False, min_steps_to_checkpoint=360, # min_steps_to_checkpoint=380, log_interval=20, grad_clip=10, lstm_size=400, output_mixture_components=20, attention_mixture_components=10 ) nn.fit()

any update?

acmattson3 / handwriting-data

Order of characters and styles with a trained model. #2

learning_rates=[.0001, .00005, .00002],

learning_rates=[.0001, .00005, .00002],