gmftbyGMFTBY / MultiTurnDialogZoo

Multi-turn dialogue baselines written in PyTorch
MIT License
162 stars 23 forks source link

An inquiry about ReCoSa model #2

Closed sonyawong closed 4 years ago

sonyawong commented 4 years ago

Hi~Thank you for sharing such a helpful repo. Now that I have a little confused about the ReCoSa model. In the original paper, the ReCoSa model is add a GRU encoder to transformer . But the code you provided seems like the major architecture is still RNN based model( the decoder is GRU).

gmftbyGMFTBY commented 4 years ago

Yes, in my implementation, the word-level encoder and decoder are GRU, the utterance-level encoder is multi-head self-attention. I carefully design the architecture after refering to the original paper. Actually, in the author's implementation, the decoder is the transformer-based, which I think it's the only difference between mine and his model.

But, after running the codes released by the author (the link above), I found that the author's code is very terrible, which cannot generate any normal respones (always generate "i like i like i like" or "yes ... yes ..."). This issue was already proposed by me issue. But the author didn't make any appropriate explanation. But at least the author acknowledges my architecture and changes for ReCoSa. Besides, my ReCoSa model can generate the normal respones which is much better than the original ReCoSa codes. In my view, I think the transformer-based decoder is the fatal weakness of the ReCoSa, which makes the performance very unstable and unsatisfied.

So, in order to distinguish with the original architecture of ReCoSa, I name it as MReCoSa (modified ReCoSa) in this repo.

I'm also struggling to finish the pure transformer model for multi-turn dialogue generation. But so far, the performance is still very bad compared with the vanilla Seq2Seq and other multi-turn dialogue baselines.

sonyawong commented 4 years ago

Yes, in mu implementation, the word-level encoder and decoder are GRU, the utterance-level encoder is multi-head self-attention. I carefully design the architecture after refering to the original paper. Actually, in the author's implementation, the decoder is the transformer-based, which I think it's the only difference between mine and his model.

But, after running the codes released by the author (the link above), I found that the author's code is very terrible, which cannot generate any normal respones (always generate "i like i like i like" or "yes ... yes ..."). This issue was already proposed by me issue. But the author didn't make any appropriate explanation. But at least the author acknowledges my architecture and changes for ReCoSa. Besides, my ReCoSa model can generate the normal respones which is much better than the original ReCoSa codes. In my view, I think the transformer-based decoder is the fatal weakness of the ReCoSa, which makes the performance very unstable and unsatisfied.

So, in order to distinguish with the original architecture of ReCoSa, I name it as MReCoSa (modified ReCoSa) in this repo.

I'm also struggling to finish the pure transformer model for multi-turn dialogue generation. But so far, the performance is still very bad compared with the vanilla Seq2Seq and other multi-turn dialogue baselines.

Okey, Thank you for your detailed explanation. I have got the same issue with you after running the code released by the author. I was wondering if there are some errors in her codes. It is amazing that the different decoders make such a big difference. I would reproduce the ReCoSa model to see result. Thank you~

gmftbyGMFTBY commented 4 years ago

Okay. If you have some problems during implementing the ReCoSa, you can communicate with me in this issue or whatever.

Hope you have a good day.