baopham / laravel-dynamodb

Eloquent syntax for DynamoDB
https://packagist.org/packages/baopham/dynamodb
MIT License
490 stars 128 forks source link

Scan is used, not Query on index #185

Open gregmsanderson opened 5 years ago

gregmsanderson commented 5 years ago

Describe the bug

I have a table with a GSI that has a hash and a range.

The GSI is specified in the model, as described.

If I do a 'where' on the hash key used in the GSI, the index is not used. The raw output shows 'Scan' is picked, not 'Query'.

Schema

In the model I defined the table name etc and added the index:

* Indexes.
     * [
     *     '<simple_index_name>' => [
     *          'hash' => '<index_key>'
     *     ],
     *     '<composite_index_name>' => [
     *          'hash' => '<index_hash_key>',
     *          'range' => '<index_range_key>'
     *     ],
     * ]
     *
     * @var array
     */
    protected $dynamoDbIndexKeys = [
        'account-created_at-index' => [
            'hash' => 'account',
            'range' => 'created_at',
        ],
    ];

... the query works and it does indeed return the correct item, but when I look into how it got there with the raw query ...

Item::where('account', 'abcde')->toDynamoDbQuery()

... it shows a Scan was done. But 'account' is the hash of an index, so Query should have been used.

To try and debug myself, looking into how it works, I see the decision to use 'Query' appears to be taken in here:

/**
     * Return the raw DynamoDb query
     *
     * @param array $columns
     * @param int $limit
     * @return RawDynamoDbQuery
     */
    public function toDynamoDbQuery(
        $columns = [],
        $limit = DynamoDbQueryBuilder::MAX_LIMIT
    ) {
        $this->applyScopes();

        $this->resetExpressions();

        $op = 'Scan';
        $queryBuilder = DynamoDb::table($this->model->getTable());

        if (!empty($this->wheres)) {
            $analyzer = $this->getConditionAnalyzer();

            if ($keyConditions = $analyzer->keyConditions()) {
                $op = 'Query';
                $queryBuilder->setKeyConditionExpression($this->keyConditionExpression->parse($keyConditions));
            } ...

... and following back from there, in the Analyzer.php, the issue seems to be this line:

if (count(array_intersect($conditionKeys, $keys)) === count($keys)) {

... since - from what I see - that results in 1 === 2, which is of course not true and so the index is not used. And that explains why the query builder defaults to Scan. I'm not quite sure why it's checking the same number is in both since as I understand it, you can query a GSI using just one of the two (ie using just the hash). In the AWS example, they show doing that - using just the hash: (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.Indexes.QueryAndScan.html)

Version info

gregmsanderson commented 5 years ago

To continue this a bit further, I hacked in some lines to see what was going on in that key bit in the Analyzer.php that decides whether to use the index:

foreach ($this->model->getDynamoDbIndexKeys() as $name => $keysInfo) {
            $conditionKeys = array_pluck($this->conditions, 'column');
            $keys = array_values($keysInfo);

            ///// hacked in /////
            print_r($conditionKeys);
            print_r($keys);
            print_r(array_intersect($conditionKeys, $keys));
            echo count(array_intersect($conditionKeys, $keys));
            echo count($keys);
            ///// hacked in /////

            if (count(array_intersect($conditionKeys, $keys)) === count($keys)) {

... and thus showing what decision was being made internally. Since from what I can see, if that line above does not return true, the index is not used. The raw output of those various print_r and echo ...

Array
(
    [0] => account
)
Array
(
    [0] => account
    [1] => created_at
)
Array
(
    [0] => account
)
12

... so again, showing with the final "12" that the value on the left of the === is the 1, and the value on the right of the === is the 2.

So that indicates to me that wherever you have an index that has a hash and a range (so 2), but the where only uses the hash (so 1), that index won't be used. As 1 !== 2

Is that correct?

phileon commented 5 years ago

Hi, I often have a similar problem, in my case helped using the function withIndex().

$item = new Item();
$item->where('account', 'abcde')->withIndex('account-created_at-index')->get();

Basically the information should be unnecessary, but it helped me to solve the problem sufficiently.

gregmsanderson commented 5 years ago

@phileon Ok, thanks for the idea. I can try that.

baopham commented 5 years ago

@gregmsanderson love the detailed analysis! looks like the decision logic could be tweaked a bit. This would require a bit of digging in the docs again to make sure we don't cause regressions. Feel free to contribute!

nelson6e65 commented 1 year ago

This is maybe related with #254 and #253

excitexcite commented 9 months ago

@phileon Greetings, I have the same problem, but withIndex() function call doesn't perform query, but scan. Could you (or anybody else) provide some information how to query table, when you have compositeKey and use only PK in query (as it shown in your example above).

nelson6e65 commented 9 months ago

@excitexcite Are you using the latest version (v6.4.0)? Version 6.2.0 introduced bugfix related with this issue.

excitexcite commented 9 months ago

@nelson6e65 Yes, I double checked it, it's 6.4.0 image