Closed khalidhuseynov closed 2 years ago
i've checked a bit more closely transformers library and seems like weights are overriden afterwards inside of from_pretrained function. Then you can ignore my question above. I had couple of follow up questions though
wav2vec2-base-960h
weights, were results any better? i didn't see anything related to this in the paperThanks in advance!
Thank you for your interests in this work.
thanks for your response. i've tested wav2vec2-base-960h
and it was helpful for model generalizability and had better performance on random speech samples (not from IEMOCAP dataset). However it's not much useful if train/test set is limited to IEMOCAP dataset. i'll close this issue then
@khalidhuseynov and @TideDancer , I'm trying to do something different that would involve building a more sophisticated architecture. Do you have any ideas on whether adding more layers before and after the pooling layer at the cls
task would improve the performance?
Also, do you think using several alpha values between 0.1 and 0.01 might improve performance?
@khalidhuseynov and @TideDancer , I'm trying to do something different that would involve building a more sophisticated architecture. Do you have any ideas on whether adding more layers before and after the pooling layer at the
cls
task would improve the performance? Also, do you think using several alpha values between 0.1 and 0.01 might improve performance?
@owos , Thanks for your interests. 1. Adding what kind of blocks before and after pooling layer? After the pooling I use a simple FC, which can be improved. Before pooling, I think you can use more fancy transformer structures, e.g. branchformer, for the speech task. 2. Feel free to adjust the alpha values if resource permitted. I didn't test values between them but it worths a try.
First of all thanks for the paper and opening up the code!
The code seems to reinitialize weights for wav2vec2 here and train model from scratch. So I was wondering about author's opinion on why not initialize model with pretrained wav2vec2 weights and train/tune from there. I believe it could be more generalizable to other datasets too and could converge faster. I may try it myself but was wondering if it was tried/tested before