forcedotcom / phoenix

BSD 3-Clause "New" or "Revised" License
558 stars 227 forks source link

Push projection of a single ARRAY element to the server #669

Open jtaylor-sfdc opened 10 years ago

jtaylor-sfdc commented 10 years ago

If only a single array element is selected, we'll still return the entire array back to the client. Instead, we should push this to the server and only return the single array element. The same goes for the reference to an ARRAY in the WHERE clause. There's a general HBase fix for this (i.e. the ability to define a separate set of key values that will be returned versus key values available to filters) that has a patch here, but is deemed not possible to pull into the 0.94 branch by @lhofhansl.

My thought is that we can add a Filter at the end our our filter chain that filters out any KeyValues that aren't in the SELECT expressions (i.e. filter out if a column is referenced in the WHERE clause, but not in the SELECT expressions). This same Filter could handle returning only the elements of the array that are referenced in the SELECT expression rather than the entire array.

ramkrish86 commented 10 years ago

I tried out a prototype of this, just for a query that selects an indexed array type column. I added an filter while creating a scan and implemented the filterKeyValue(kv) in that filter.

@Override
public ReturnCode filterKeyValue(KeyValue kv) {
    inputTuple.setResult(new Result(new KeyValue[]{kv}));
    if(evaluate(inputTuple)) {
        byte[] val = new byte[tempPtr.getLength()];
        System.arraycopy(tempPtr.get(), tempPtr.getOffset(), val, 0, tempPtr.getLength());
        //KeyValue newKv = new KeyValue(kv.getRow(), kv.getFamily(), kv.getQualifier(), val);
        kv = new KeyValue(kv.getRow(), kv.getFamily(), kv.getQualifier(), val);
        return ReturnCode.INCLUDE;
    } else {
        return super.filterKeyValue(kv);
    }
}

protected boolean evaluate(Tuple input) {
    try {
        if (!expression.evaluate(input, tempPtr)) {
            return false;
        }
    } catch (IllegalDataException e) {
        return Boolean.FALSE;
    }
    return true;
}

Where expression is that ArrayIndexFunction. But how can we rewrite the KeyValue itself that is coming to the filterKeyValue.
Am not able to rewrite the kv. Is there something am missing here? One thing is if the query is SELECT a_double_array[2] FROM table_with_array it selects only that KV which has this qualifier a_double_array. So we have to get the required indexed value from this keyvalue only.

ramkrish86 commented 10 years ago

I think the transform() in Filter can be used here? But that would be costly IMO. Also need to see if rewriting KV would help.

jtaylor-sfdc commented 10 years ago

I think you'd want to rewrite the KeyValue. It would require some client/server coordination:

ramkrish86 commented 10 years ago

Yes James, I am doing all the above mentioned steps. May be am facing issue due to #4. But seems to work.

ramkrish86 commented 10 years ago

May be this rewriting KV is a costly operation and may be better than sending big arrays as bytes over wire I think.

ramkrish86 commented 10 years ago

May be am facing issue due to #4

I meant the multiple references you said.

jtaylor-sfdc commented 10 years ago

@ramkrish86: Opened https://issues.apache.org/jira/browse/PHOENIX-10 with a comment on how this could be approached (leveraging a bunch of code that's already there). Let's discuss further over there.