duoshuo / php-cassandra

php Cassandra driver which support Protocol v3 (Cassandra 2.1) and asynchronous requests
MIT License
153 stars 53 forks source link

Querying performance with synced prepared query #44

Open Divi opened 9 years ago

Divi commented 9 years ago

Hello,

I'm running a benchmark with your lib, and it is pretty good :+1: But, I have some performance issues with synced prepared queries :

// running this x10000, with random key value
$connection->querySync("SELECT * FROM keyspace.columnFamily where key = 'foo'");
// got response ~ 6.255 sec

and, from your example :

// running this x10000, with random key value, I known it's stupid to put the parameter directly in the query
$preparedData = $connection->prepare("SELECT * FROM keyspace.columnFamily where key = :keyValue");
$strictValues = [...];
$response = $con->executeSync($preparedData['id'], $strictValues, Cassandra\Request\Request::CONSISTENCY_ONE, [
    'page_size' => 100,
    'names_for_values' => true,
    'skip_metadata' => true,
]);
// got response ~ 23.331 sec

Why this difference ?

TL;DR: got 6.255 sec with querySync() and 23.331 sec with executeSync() on a x10000 loop for a single SELECT.

Thanks :)

shen2 commented 9 years ago

Show your full.benchmark code.

Divi commented 9 years ago

Sure, here we go :

$connection = new \Cassandra\Connection([
    '127.0.0.1'
]);

// $connection->querySync('DROP TABLE benchmark.bar');
$connection->querySync('CREATE TABLE benchmark.bar (
  userid text PRIMARY KEY,
  column1 text, column2 text, column3 text, column4 text, column5 text,
  column6 text, column7 text, column8 text, column9 text, column10 text,
  column11 text, column12 text, column13 text, column14 text, column15 text,
  column16 text, column17 text, column18 text, column19 text, column20 text
)');

// insert 10 000 rows
for ($i=0; $i<10000; $i++) {
    $connection->queryAsync("INSERT INTO benchmark.bar (
        userid,
        column1, column2, column3, column4, column5,
        column6, column7, column8, column9,column10,
        column11, column12, column13, column14, column15,
        column16, column17, column18, column19, column20) VALUES (
        'row" . $i . "',
        'foo', 'bar', 'foo', 'bar', 'foo',
        'foo', 'bar', 'foo', 'bar', 'foo',
        'foo', 'bar', 'foo', 'bar', 'foo',
        'foo', 'bar', 'foo', 'bar', 'foo'
    )");
}

// select 10 000 rows with executeSync()
for ($i=0; $i<10000; $i++) {
    $preparedData = $connection->prepare("SELECT * FROM benchmark.bar where userid = :userid");
    $strictValues =  \Cassandra\Request\Request::strictTypeValues([
            'userid' => 'row' . $i,
    ], $preparedData['metadata']['columns']);
    $response = $con->executeSync($preparedData['id'], $strictValues, Cassandra\Request\Request::CONSISTENCY_ONE, [
        'page_size' => 100,
        'names_for_values' => true,
        'skip_metadata' => true
    ]);
}

// select 10 000 rows with querySync()
for ($i=0; $i<10000; $i++) {
    $connection->querySync("SELECT * FROM benchmark.bar where userid = 'row" . $id . "'");
}
shen2 commented 9 years ago

Obviously, you prepared the query for 10000 times. Actually, you can prepare the query once, and execute for 10000 times.

Divi commented 9 years ago

Indeed, but it was for this benchmark. In real case, I can have ~500 queries per page, and I will have to prepare a query each time because it will not the same.
I'm surprised by the difference between a prepared & not prepared query, so I'm wondering if I'm doing well (and obviously, for this benchmark : no, but see it as 10 000 different queries).

Divi commented 9 years ago

I made another test, with one unique prepared statement, here the result :