Closed bonham79 closed 2 weeks ago
This is pretty cool! Right now my comments are mostly about naming and style, and all quite minor. I would like Adam to weigh in how it interacts with the other modules/models (like, should it be in
models/modules/lstm.py
? Is it DRY?) next, then we can all test.Before this is done we'll want to update the README too.
It shouldn't be another module (we have enough). It's just an expansion of LSTM behavior. If we're interested in expanding it to general transformer behavior then we can generalize it as a mixin. But for first PR we can keep in LSTM.
Broader question: given that Wu & Cotterell didn't find an improvement for going from zeroth to first order, would we want to consider getting rid of the contextual form? I think Adam has had the philosophy that it's nice to have stuff that doesn't work around. I am not sure (it adds a maintainence burden) but I haven't actively disagreed. I don't feel strongly either way but I wanted to start the conversation...
Broader question: given that Wu & Cotterell didn't find an improvement for going from zeroth to first order, would we want to consider getting rid of the contextual form? I think Adam has had the philosophy that it's nice to have stuff that doesn't work around. I am not sure (it adds a maintainence burden) but I haven't actively disagreed. I don't feel strongly either way but I wanted to start the conversation...
We should keep. They see results everywhere but G2P iirc the paper correctly. It's also capable of more expressive graphs, which can come in handy for other experiments.
I guess since Travis implemented the refactor for distinguishing models and modules I can defer to where he thinks things should go. That being said, I do not agree that "we have enough" is a good argument to keep things out of modules. It feels a bit weird to have a new file for HMM models, but no file for HMM modules given that some new HMM modules have been added. I suppose the thing that is a bit weird is that most models really refer to a specific decoder, and are thus coupled to a decoder Module, which then are not actually very general right?
I want to limit the addition of new modules and file names to conservative, 'we have to basis.' I've lost too much time having to scan ESPNet and fairseq libraries to figure out which parts are going where. Modularity is good, but too much modularity is a problem for me.
My general approach for models is they contain everything necessary for that model. For modules, those should just contain the lego block pieces to customize models. If I have an lstm file, the design idea is that I can use LSTMs wherever I want. If we create a new file for hard_attention, that's implying, to me, that I can apply hard_attention modules wherever I want. Since that is not the case, I'd rather stick them in my lstm box.
means. It sounds like it could mean introducing yet another abstraction pattern, which I am skeptical of. I guess there are enough inherent differences in the decoders that I would want to approach it by just writing a new model class entirely.
Yeah, that's what I mean. I'd be okay with separating into another file IF we want to generalize hard_attention module decoding for variations on encoders and decoders. (e.g. swap out lstm, transformer, gru, ssm).
This all basically looks fine to me, but I'll defer to Adam for final approval.
I would recommend defining
1e7
as a constant and explaining what it does when you use it, since future-us will have no context for why that's done. It's okay if the comment just says it is following practices elsewhere.
Modulo method-level docs that is.
This all basically looks fine to me, but I'll defer to Adam for final approval.
I would recommend defining
1e7
as a constant and explaining what it does when you use it, since future-us will have no context for why that's done. It's okay if the comment just says it is following practices elsewhere.
Gonna define the log variant since it's technically an expensive operation and only need value once. Else can do.
I guess since Travis implemented the refactor for distinguishing models and modules I can defer to where he thinks things should go. That being said, I do not agree that "we have enough" is a good argument to keep things out of modules. It feels a bit weird to have a new file for HMM models, but no file for HMM modules given that some new HMM modules have been added. I suppose the thing that is a bit weird is that most models really refer to a specific decoder, and are thus coupled to a decoder Module, which then are not actually very general right?
I want to limit the addition of new modules and file names to conservative, 'we have to basis.' I've lost too much time having to scan ESPNet and fairseq libraries to figure out which parts are going where. Modularity is good, but too much modularity is a problem for me.
My general approach for models is they contain everything necessary for that model. For modules, those should just contain the lego block pieces to customize models. If I have an lstm file, the design idea is that I can use LSTMs wherever I want. If we create a new file for hard_attention, that's implying, to me, that I can apply hard_attention modules wherever I want. Since that is not the case, I'd rather stick them in my lstm box.
means. It sounds like it could mean introducing yet another abstraction pattern, which I am skeptical of. I guess there are enough inherent differences in the decoders that I would want to approach it by just writing a new model class entirely.
Yeah, that's what I mean. I'd be okay with separating into another file IF we want to generalize hard_attention module decoding for variations on encoders and decoders. (e.g. swap out lstm, transformer, gru, ssm).
Right agreed. And tbh we already have more modularity than I like :D, I think it is an issue in almost every modern NLP library/toolkit I have used..
Right agreed. And tbh we already have more modularity than I like :D, I think it is an issue in almost every modern NLP library/toolkit I have used..
yeahhhh, i just accept feature creep as an invetability and try to plan around it. sera sera
@kylebgorman Replaced the math
with log
. Found a few redundant uses of inf
so just created a defaults.INF
and defaults.NEG_INF
set of arguments. While at it I also offloaded epsilon to defaults anyhow.
@kylebgorman Replaced the
math
withlog
. Found a few redundant uses ofinf
so just created adefaults.INF
anddefaults.NEG_INF
set of arguments. While at it I also offloaded epsilon to defaults anyhow.
What you did there looks good. I'll let you and Adam work out the remaining details---ping me when you're ready to merge.
@Adamits Anything major that I haven't commented on yet?
@kylebgorman revert fly by comment to width. Size should only refer.to tensor.dim.
@kylebgorman revert fly by comment to width. Size should only refer.to tensor.dim.
done
Wu and Coterrell's hard monotonic transduce..
Closes https://github.com/CUNY-CL/yoyodyne/issues/165.
It builds off lstm module. To call you pass
hmm_lstm
for--arch
. Hard monotonic constrain is enforced with--enable_monotonic
while context window is enabled with--hmm_context
.While y'all are merging, could you take a peek at how attention and log_prob contexts are preserved? I hate having that large a tensor filling up memory but when I try to be clever about it it causes model collapse.