Made embedding_dim configurable based on vocab

allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)

Apache License 2.0

404 stars 133 forks source link

Made embedding_dim configurable based on vocab #262

Closed matt-gardner closed 7 years ago

matt-gardner commented 7 years ago

I also changed embedding_size to embedding_dim, which is why this PR touched so many files. The main difference is in text_trainer.py. And while I was at it, I updated a couple of tests to add a new convenience method I added a few PRs ago, and that revealed some problems in _set_max_lengths_from_model, which I also fixed.

And I removed a test that I didn't think was necessary anymore, because we don't really use the multiple choice memory network - I didn't want to do the work necessary to make the test pass with this new set up.

nelson-liu commented 7 years ago

And I removed a test that I didn't think was necessary anymore, because we don't really use the multiple choice memory network - I didn't want to do the work necessary to make the test pass with this new set up.

So the multiple choice memory network is broken, or just the test?

matt-gardner commented 7 years ago

The memory network code is at the moment incompatible with the words and characters tokenizer, if you use a bag of words encoder, because of encoding dimension stuff. It still works with other tokenizers, as the tests I left in show. I figured it wasn't worth the time to make that old code compatible with words and characters tokenizers.

nelson-liu commented 7 years ago

The memory network code is at the moment incompatible with the words and characters tokenizer, if you use a bag of words encoder, because of encoding dimension stuff.

ok, that should be mentioned somewhere appropriate...not sure where though

matt-gardner commented 7 years ago

I also just realized I forgot to update all of the example experiments to use the new parameter specification. I'll do that in another PR, in which I'm also updating bidaf_squad.json to use more sane parameters.