festvox / flite

A small fast portable speech synthesis system
Other
859 stars 186 forks source link

Why pre-training models have different sizes? #81

Closed rzy6461 closed 2 years ago

rzy6461 commented 2 years ago
For different speakers,I found that the sizes of the models were different,as shown below:

image

  It is generally agreed that, if the model structure is determined, the model size should be independent of the training data size.However, I found that the model size is proportional to the data size.For example,the data size of slt is about 1 hour,and the model size is 11MB. The data size of ljm is about 0.5 hour,and the model size is 5.5MB. Does it mean that the more training data, the larger the model size?

awbcmu commented 2 years ago

Yes, the model is optimised on the datasize, rather on the "result" it could be possible to do that, but we'd probably do that by selecting a subset of the training data rather than optimizing the stopping criteria at training time.

Note different speakers spoke at slightly different rates, and that different speakers are more consistent that others, which can lead to smaller models with the same amount of data. At the larger sizes, the current techniques don't really take advantage of datasizes greater than a a few hours.

Alan

On Sun, Aug 7, 2022 at 11:28 PM rzy @.***> wrote:

For different speakers,I found that the sizes of the models were different,as shown below:

[image: image] https://user-images.githubusercontent.com/43977111/183359752-c8c99c29-1457-435d-a341-190937fbc55b.png

It is generally agreed that, if the model structure is determined, the model size should be independent of the training data size.However, I found that the model size is proportional to the data size.For example,the data size of slt is about 1 hour,and the model size is 11MB. The data size of ljm is about 0.5 hour,and the model size is 5.5MB. Does it mean that the more training data, the larger the model size?

— Reply to this email directly, view it on GitHub https://github.com/festvox/flite/issues/81, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOEXNGDF57WCC4XIAHBL23VYCZIPANCNFSM554B4TUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>