Closed theseventhflow closed 4 years ago
Yes, the dataset is the whole dataset I used to train the shared checkpoint. The testset loss is 0.25 in my experiment.
Thanks for your reply! My test loss is 1.24 with the "chord" words. Did you get 0.25 without "chord"?
Both shared checkpoint losses are 0.25.
Got it. Thank you!
Hello,
I have tried to train transformer-xl from scratch. However, the generating result is not as good as yours. The theme of one generated midi file is not consistent, which means the latest slice of sequence hears different from the prompt. Therefore,could you tell me if the dataset you provide is the whole dataset? or if you used other bigger dataset for pre-training and used the provided one only for finetuning?
My experiment setting is: n_layers: 12 x_len: 512 m_len:512 ff:2048
And could you tell me the testset cross-entropy loss you got in your experiment?
Thank you!
Hello, excuse me, I want to know how you started from scratch
Hello,
I have tried to train transformer-xl from scratch. However, the generating result is not as good as yours. The theme of one generated midi file is not consistent, which means the latest slice of sequence hears different from the prompt. Therefore,could you tell me if the dataset you provide is the whole dataset? or if you used other bigger dataset for pre-training and used the provided one only for finetuning?
My experiment setting is: n_layers: 12 x_len: 512 m_len:512 ff:2048
And could you tell me the testset cross-entropy loss you got in your experiment?
Thank you!