Open shimengfeng opened 4 years ago
This would be great!
Hi all, I am planning on working on this and will update the issue again when a PR is ready for review. Thanks!
Hi @JohnGiorgi I was wondering if you could help undertstand the copy score functionality changes when there is more than one decoder layer. Would it be correct to grab the last layer from state["decoder_hidden"]
?
@annajung Sorry I am getting back so late! Are the decoder layers all RNNs? If so I think it makes the most sense to use the last hidden vector output from the last decoder layer.
If the decoder can be something else (like a transformer), I think the answer is a little more complicated.
No worries! @JohnGiorgi, @epwalsh gave the same feedback on the open PR. PTAL when you get a chance :) https://github.com/allenai/allennlp-models/pull/245
@annajung Looks good :) I left a comment!
I think I originally thought the goal was to add a composable decoder to the copynet model, similar to what already exists for the ComposedSeq2Seq so that you could easily experiment with the different decoders implemented in AllenNLP.
But re-reading #4451 and #4766 it seems I might have understood, and what I was thinking is probably tricker to implement.
oh interesting! Do you still want to merge the open PR or look into a composable decoder?
@JohnGiorgi @annajung I like this idea of adding a composable decoder to CopyNet, which could also support multi-layer LSTMs. If you either of you would like to look into this, I think we should hold off on https://github.com/allenai/allennlp-models/pull/245.
@annajung What do you think? I would be game to try and implement it. I am excited to try out a transformer decoder layer within copynet for my own project. I should be able to start next week!
sure that works, no worries! I can close my PR in favor of a composable decoder. thanks both for the feedback!
Okay I have made a decent amount of progress on this. I have replaced self._decoder_cell
with self.decoder
, which is a SeqDecoder subclass. Now I am trying to figure out how to replace the following:
state["decoder_hidden"], state["decoder_context"] = self._decoder_cell(
projected_decoder_input.float(),
(state["decoder_hidden"].float(), state["decoder_context"].float()),
)
It looks like I will have to drop down to the forward
method of self._decoder._decoder_net
, something like:
new_state, state["decoder_hidden"] = self._decoder._decoder_net(
previous_state=state,
encoder_outputs=state["encoder_outputs"],
source_mask=state["source_mask"],
previous_steps_predictions=last_predictions
)
state["decoder_context"] = new_state.get("decoder_context", None)
I am unsure about this approach for a couple of reasons. Partly because it looks like there are differences in how each decoder (LstmCellDecoderNet
, StackedSelfAttentionDecoderNet
) implements this function, but also because I don't know how projected_decoder_input
fits in.
Any help is welcome! @epwalsh @annajung
@JohnGiorgi one thing to keep in mind is that we want to keep backwards compatibility. So a CopyNet model trained before with a _decoder_cell
should still work.
And if the DecoderNet
API doesn't really fit, then I think it's okay to create a new abstraction that works better here. Something like CopyNetDecoder
.
@JohnGiorgi one thing to keep in mind is that we want to keep backwards compatibility. So a CopyNet model trained before with a
_decoder_cell
should still work.
I see. So the main considerations here are
CopyNet
configs can be loaded in the "new" CopyNet
.state_dict
of the old CopyNet can be loaded into this new CopyNet. I think the only place the state_dict
s would differ is that the old CopyNet will have these parameters, and the new CopyNet wont:0 | 2021-04-13 06:57:14,602 - INFO - allennlp.common.util - _decoder_cell.weight_ih
0 | 2021-04-13 06:57:14,602 - INFO - allennlp.common.util - _decoder_cell.weight_hh
0 | 2021-04-13 06:57:14,602 - INFO - allennlp.common.util - _decoder_cell.bias_ih
0 | 2021-04-13 06:57:14,603 - INFO - allennlp.common.util - _decoder_cell.bias_hh
One solution I can think of:
CopyNet
(e.g. attention
, beam_size
, etc) so old configs can be loadeddecoder
(like ComposedSeq2Seq
) that is a DecoderNet
.decoder
is not None, init it with the params passed to CopyNet
(e.g. attention
, beam_size
, etc). LSTMCell
as is currently done.I think this would achieve backwards compatibility. Let me know if you spot a mistake or have a better solution, though!
And if the
DecoderNet
API doesn't really fit, then I think it's okay to create a new abstraction that works better here. Something likeCopyNetDecoder
.
Okay, I think we may have to (or do some subclassing). I don't see a way for me to use the two currently implemented DecoderNet
s without modification. Will work on this.
Is your feature request related to a problem? Please describe. Current Copynet architecture only allows one LSTM decoder layer. This similar to #4451
Describe the solution you'd like Give users the choice of having multiple decoder layers to have a more comprehensive copynet encoder-decoder architecture.
Describe alternatives you've considered N/A
Additional context N/A