danielmewes / php-rql

A PHP client driver for the RethinkDB query language (ReQL).
http://php-rql.dnsalias.net/
339 stars 60 forks source link

Check if we need to create iterate cursor. #145

Closed RoySegall closed 8 years ago

RoySegall commented 8 years ago

144

danielmewes commented 8 years ago

This patch doesn't work. If the server returns a cursor (ResponseREsponseType::PB_SUCCESS_PARTIAL), we must create a cursor or at least send additional requests to the server to retrieve the remaining data. Otherwise (and as this patch stands), you will only get the first batch of results of the query, and not the full results. The first batch can be arbitrarily small, and its size is not deterministic.

E.g. try running this query:

r\table("test")->run($conn, array("max_batch_rows" => 1))

Regardless of how many documents are in the table "test", this will only return one row in the first batch. This patch will simply discard all the remaining batches. (additionally it leaks resources on the server, because it doesn't close the cursor and the server will keep it open for as long as the connection is open).

RoySegall commented 8 years ago

This patch doesn't work.

The patch does not work ideology, or it won't return stuff? If so, I disagree, and if the code got broken - QA tests should catch it.

From the performance perspective, we don't need to create an literateable cursor - It's Happens by default and it's an excellent idea, but when dealing with 1000 records it's not on our side.

It's not mentions here, but the performance for 1000 records went from 0.07MS to 0.045MS.

I'll give a look on your suggestion and see if it's working or not.

danielmewes commented 8 years ago

It doesn't work because it:

Whether a result is a cursor or not is not something the client can choose. It's something that is determined by the query on the server side. If the server sends us batches, we have to deal with them and request the subsequent batches as well (which is what the cursor object does). If you don't want a cursor interface, the ->toArray() call converts it into an array. Otherwise you can change the query by adding the ->corceTo("array") call to force the server to send us an array.

I don't know where the performance difference comes from, but the first thing to check would be that you actually get all 1000 results if you only take the first batch. One reason for why it might be faster is that it might drop a part of the result set.

RoySegall commented 8 years ago

All the 1000 results returned, I gave it another check. As I mentioned in the issue when we use the function createCursorFromResponse which eventually goes to setBatch and run for the first time over the 1000 items. After that, I'm iterating again over the 1000 records.

What's the cases when the server did not return all the results and we need another batch?

RoySegall commented 8 years ago

Otherwise you can change the query by adding the ->corceTo("array")

Are you sure about the name of the function? I can't fund thing like in the source code.

danielmewes commented 8 years ago

Sorry should be coerceTo("array")

RoySegall commented 8 years ago

I did some checking about it, and this are the averages for 10 requests with 1000 records:

    $documents = $this->table
      ->getMultiple($cids)
      ->coerceTo("array")
      ->run($this->connection, array("max_batch_rows" => 1));

Nailed it 0.05MS

    $documents = $this->table
      ->getMultiple($cids)
      ->run($this->connection, ['cursor' => TRUE]);

nailes it with 0.021MS

And I tried

    $documents = $this->table
      ->getMultiple($cids)
      ->run($this->connection, ['cursor' => TRUE, "max_batch_rows" => 1]);

The request nailed it in 0.013 but I get 8 items when I need 1000 which the requests before got it right. The differences are 300MS which is a lot particularly in terms of caching in Drupal

RoySegall commented 8 years ago

And MySQL got it done with 0.065(sometimes 0.07) - 33% improvment.