allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 133 forks source link

MCTupleWithBackgroundInstance #213

Closed BeckySharp closed 7 years ago

codecov[bot] commented 7 years ago

Codecov Report

Merging #213 into master will increase coverage by 0.1%. The diff coverage is 80.1%.

@@            Coverage Diff            @@
##           master     #213     +/-   ##
=========================================
+ Coverage   80.82%   80.92%   +0.1%     
=========================================
  Files         105      109      +4     
  Lines        5334     5573    +239     
=========================================
+ Hits         4311     4510    +199     
- Misses       1023     1063     +40
Impacted Files Coverage Δ
...ta/instances/mc_tuples_with_background_instance.py 80.1% <80.1%> (ø)
deep_qa/data/tokenizers/word_splitter.py 93.47% <ø> (-0.14%) :x:
...qa/data/tokenizers/word_and_character_tokenizer.py 100% <ø> (ø) :white_check_mark:
deep_qa/data/tokenizers/word_tokenizer.py 100% <ø> (ø) :white_check_mark:
deep_qa/data/tokenizers/word_filter.py 94.73% <ø> (ø)
deep_qa/data/tokenizers/word_processor.py 95% <ø> (ø)
deep_qa/data/tokenizers/word_stemmer.py 94.73% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 241e5d1...2a72b6f. Read the comment docs.

BeckySharp commented 7 years ago

ok all -- millions of comments back and forth. I have changed most of these -- but left a few unaddressed for now until I hear back. I'll go ahead and re-push to make viewing these comments more manageable (at least for me)

BeckySharp commented 7 years ago

ok -- I have addressed many of these, and just repushed, so hopefully github will collapse the old/unaddressed ones.

I know I still need to write the test that Matt wants, working on that now...

BeckySharp commented 7 years ago

oh, and re: the num_answers column in the input -- I'll add that to the tushar request and when it's changed on his end I'll change it here

On Tue, Feb 21, 2017 at 3:31 PM, Matt Gardner notifications@github.com wrote:

@matt-gardner commented on this pull request.

In tests/data/instances/mc_tuples_with_background_instance_test.py https://github.com/allenai/deep_qa/pull/213#discussion_r102347273:

  • for slot in background_tuple:
  • assert len(slot) == background_slot_length
  • def test_as_training_data_produces_correct_numpy_arrays(self):
  • max_lengths = {'num_question_tuples': 2,
  • 'num_background_tuples': 3,
  • 'num_slots': 2,
  • 'question_slot_length': 2,
  • 'background_slot_length': 3,
  • 'num_options': 3}
  • self.indexed_instance.pad(max_lengths)
  • inputs, label = self.indexed_instance.as_training_data()
  • assert np.all(label == np.asarray([0, 0, 1]))
  • desired_options = np.asarray([[[[1, 2], [0, 3]],

Yes, you're right about that. The type annotations are not actually enforced by anything, they are there to improve readability. So I think it's fine to leave the annotations as they are, as they document the typical case, even though the type might actually change if you use a different tokenizer.

And I think the only things you should need to change are things I've already suggested - if you make sure you're calling super class methods to actually do the indexing of words (you are) and to handle padding word sequences (you're not yet, but I gave comments on that), it should just work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/allenai/deep_qa/pull/213#discussion_r102347273, or mute the thread https://github.com/notifications/unsubscribe-auth/AFIniaQwOgbcZI95uaO-v41bZMXdAJTjks5re3PNgaJpZM4MGh_Q .