Closed narekgharibyan closed 7 years ago
I like the added Empty() interface.
The fix looks good, it's probably a good idea to prevent adding it early on. Still, that hides a bit the real problem which is the unchecked push here:
https://github.com/cliqz-oss/keyvi/blob/master/keyvi/src/cpp/dictionary/dictionary_merger.h#L153
As in https://github.com/cliqz-oss/keyvi/blob/master/keyvi/src/cpp/dictionary/dictionary_merger.h#L199
we should only push SegmentIterator if it isn't exhausted. Although it is not necessary with the fix, I suggest to either document it there, or fix it to be on the safe side.
Hmmm,
In the second case I think we don't have a problem at all: https://github.com/cliqz-oss/keyvi/blob/master/keyvi/src/cpp/dictionary/dictionary_merger.h#L199
The if
check makes sure that we push only valid iterator.
For the first case, I assume we are guaranteed that our iterator is valid (non-empty automata can not return empty iterator).
But we can add an assert
here as an internal integrity check.
Maybe, I described it wrong:
I am asking for doing at line 153 the same as at line 199:
SegmentIterator segment_it(e_it, i++);
if (segment_it) {
pqueue.push(segment_it);
}
What I was suggesting is to have an assert check:
SegmentIterator segment_it(e_it, i++);
assert(segment_it);
pqueue.push(segment_it);
According to our semantics we can not have a case when it is false, yes ? (after empty check)
[Update]
Also I'm not sure what we have to do with other fields (for instance inputFiles_
) in case we skip some segment.
BTW, any news about your Skype ? :)
I think an empty iterator is still a valid one, as a dictionary without a key is valid as well. Therefore I think assert isn't the right thing here. You are right that your empty check already prevents that case, so we are just talking about correctness/code sugar.
@hendrikmuhs please let me know what you think about the last commit.
Also I wanted to add a check, which prevents adding the same file multiple times:
void Add(const std::string& filename) {
if (std::count(inputFiles_.begin(), inputFiles_.end(), filename)) {
throw std::invalid_argument("File is added already");
}
...
}
But realized that some of unit tests already do that, like here: https://github.com/cliqz-oss/keyvi/blob/master/keyvi/tests/cpp/dictionary/dictionary_merger_test.cpp#L171 Is it intentional ?
LGTM!, That moves error handling where it should be, meaning if you give it a non-existing or corrupt file it fails in Add() not in Merge(). Good idea!
As for the test: Not intentional, I think the idea was just to overwrite a key with a new value and then overwrite it back to the original state. So to fix it you can simply create another dictionary based on test_data.
LGTM
@hendrikmuhs Fixes #211