basho / eleveldb

Erlang LevelDB API
262 stars 176 forks source link

Add Riak TS "IS NULL" and "IS NOT NULL" handling #219

Closed paegun closed 7 years ago

paegun commented 7 years ago

NOTE: this is the part 1 of a 3 PR set to implement Riak TS "IS NULL" and "IS NOT NULL".

hmmr commented 7 years ago

My review is limited by the diff provided. That is, I can see why the added/modified code is there and how the change is congruent with the purpose (and I am fine with how my comments were addressed). However, I cannot positively verify that the diff provided is all there needs to be done. I therefore call on @erikleitch or @matthewvon to participate.

matthewvon commented 7 years ago

Apologies, but this is all @erikleitch. I cannot help.

erikleitch commented 7 years ago

These are all changes to allow use of [], or "" as a stand-in for NULL when performing normal equality comparisons in TS filters.

The eleveldb changes are predicated on the assumptions that:

  1. the query pipeline agrees to specify field type of varchar for any field being compared to NULL, regardless of the field's actual type, and
  2. the query pipeline will use [] or "" as the NULL value to be compared against

Assuming those assumptions are met (and I haven't gone through the rest of the PRs to make sure), I think that this PR does everything it needs to for the NULL comparisons to work and still allow comparisons of actual varchar field values against an empty value to succeed (i.e., <<"">>, which is a valid empty varchar value, as opposed to an unspecified value).

My only request is that rather than modifying ErlUtil::getBinary() to implicitly accept an empty list, we instead add a new method -- ErlUtil::getBinaryOrEmptyList() and use this in filter_parser.cc. The reason is that ErlUtil is intended to be a generalized utility class for decoding erlang terms, that may be used elsewhere in the codebase, and I'd like the default behavior of things like getBinary to be generally 'correct' -- in this case, throw an error if the decoded term is not actually inspectable as a binary -- and have TS-specific behaviors wrapped up in separate methods.

I see also that you've added a throw in the case that non-TS data are encountered during the range scan (the change to workitems.cc). Originally, I wasn't sure if KV and TS records could be interleaved, so this was deliberately not a throw, so that non-TS records would just be quietly skipped. I think we've since decided that this situation cannot happen, so I'm fine with adding the throw, but suggest that you update the comment above it accordingly.

One final note, and feel free to ignore it, because it isn't specifically related to the NULL changes in this PR, but: one complication with encoding schemes that try to optimize size is that the fields of a fixed type in a table schema are actually encoded to a variety of different types, depending on the value (ie, table schema may call something a sint64, but under the hood, msgpack/ttb will actually encode it as a 16-bit, 32-bit, 64-bit (or even 40-bit!) value). It's the reason for the apparently Byzantine structure of the CmpUtil::objectToTypeX(&obj) methods. This also means that the current suite of tests in sut.erl exercising integer operations don't actually cover all integer use-cases, because our tests only use small ints. What I'd like to do is expand the test cases to include an additional field that forces large-int encoding, for example:

    {<<"f7">>, case I of
                      10 -> [];
                      12 -> [];
                      _ -> 1467563367600 + I * 1000
              end} %%<< sint64 stored as a big num    

in putKeyNormalOps, then include:

isNullLargeInteger_test() ->
    isNullFieldOfTypeTestFactory(true, <<"f7">>).

isNotNullLargeInteger_test() ->
    isNullFieldOfTypeTestFactory(false, <<"f7">>).

in the test suite. But there are even more relevant tests that should be expanded to cover large-int ops than the NULL tests... I intend to do this when I submit the msgpack-replacement PR if you don't want to deal with it here.

paegun commented 7 years ago

Thank you @hmmr and @erikleitch .

@erikleitch your assumptions about Riak TS NULL are correct and IMHO prove that this feature is committed in a way that doesn't require jumping through many modules, since the review didn't require reading all the related PRs. Clarifying the comment above the throw hopefully adds to such ease in reading eleveldb.

As @erikleitch suggested, I have removed the change to ErlUtil::getBinary(), adding ::getBinaryOrEmptyList() which only is utilized w/i the NULL comparisons.

Also as @erikleitch suggested, I have added the unrelated to NULL, but related in terms of area of code change, coverage of sint64 stored as a large number.

hmmr commented 7 years ago

ca6f339 +1

erikleitch commented 7 years ago

ca6f339 +1

hazen commented 7 years ago

@erikleitch @hmmr Pretty sure we need the +1s in the other order

hazen commented 7 years ago

:+1: ca6f339

hmmr commented 7 years ago

+1 ca6f339