Closed SteveTanggithub closed 3 years ago
2021 ICASSP PAPER A NOVEL END-TO-END SPEECH EMOTION RECOGNITION NETWORK WITH STACKED TRANSFORMER LAYERS
@SteveTanggithub Thanks I dont think the author will reply, near future , This huge improvement in accuracy is not that easy. Accuracy on this dataset for audio only is 66% on whole 5531 utterances for 4 emotion classes. Anybody who is claiming for more should provide the code. otherwise its difficult to verify.
@SteveTanggithub Thanks I dont think the author will reply, near future , This huge improvement in accuracy is not that easy. Accuracy on this dataset for audio only is 66% on whole 5531 utterances for 4 emotion classes. Anybody who is claiming for more should provide the code. otherwise its difficult to verify.
To be honest, I don't believe the claim in this paper, the author should prove this unusual improvement or withdraw this paper.
Please read our ReadMe
Please read our ReadMe
multi_branch is what ? where is code...
Looking forward to your open source code... To be honest, I don't believe such a high performance simply by introducing the STLs.
I noticed that you have provided the data split code. But you should probably provide the EXACT csv files that contain train, valid and test data, respectively, to guarantee the reproduction.
It's normal that inclusive speaker experiments can derive higher performance, but it's pretty exaggerated to obtain over 90% of UA and WA, simply using the acoustic information.
Looking forward to your open source code... To be honest, I don't believe such a high performance simply by introducing the STLs. I noticed that you have provided the data split code. But you should probably provide the EXACT csv files that contain train, valid and test data, respectively, to guarantee the reproduction. It's normal that inclusive speaker experiments can derive higher performance, but it's pretty exaggerated to obtain over 90% of UA and WA, simply using the acoustic information.
Thank you for your query and we will update the rest of the code as soon as possible.
Looking forward to your open source code... To be honest, I don't believe such a high performance simply by introducing the STLs. I noticed that you have provided the data split code. But you should probably provide the EXACT csv files that contain train, valid and test data, respectively, to guarantee the reproduction. It's normal that inclusive speaker experiments can derive higher performance, but it's pretty exaggerated to obtain over 90% of UA and WA, simply using the acoustic information.
Thank you for your query and we will update the rest of the code as soon as possible.
which paper?