bchavez / RethinkDb.Driver

:headphones: A NoSQL C#/.NET RethinkDB database driver with 100% ReQL API coverage.
http://rethinkdb.com/api/java
Other
384 stars 134 forks source link

RunResult SUCCESS_PARTIAL error #91

Closed stefanprodan closed 8 years ago

stefanprodan commented 8 years ago

I have a table with over 100k rows. When I search for a term that is present in the first 100k records, RunResult works just fine. If I search for a term that is present in older records, below first 100k, RunResult crashes while RunCursor works.

RunResult function:

public List<LogEntry> Search(string query, int limit)
{
    var conn = _rethinkDbFactory.CreateConnection();
    var logs = R.Db(_rethinkDbFactory.GetOptions().Database)
        .Table("Logs")
        .OrderBy()[new { index = R.Desc(nameof(LogEntry.Timestamp)) }]
        .Filter(t => t.CoerceTo("string").Match($"(?i){query}"))
        .Limit(limit)
        .RunResult<List<LogEntry>>(conn);

    return logs;
}
var list = Search("value_at_rec_101k", 50);

Error:

The query response cannot be converted to an object of T or List<T>. This run helper works with SUCCESS_ATOM or SUCCESS_SEQUENCE results. The server response was SUCCESS_PARTIAL. If the server response can be handled by this run method try converting to T or List<T>. Otherwise, if the server response cannot be handled by this run helper use another run helper like `.RunCursor`.

If I replace RunResult<List<LogEntry>> with RunCursor<LogEntry> then it works.

I've modified all my queries and replaced RunResult<List<T>> with RunCursor<T> to avoid this issue. I find it strange that RunResult crashes only in this particular use case.

bchavez commented 8 years ago

Hey @stefanprodan

The main purpose of RunResult<T> is to handle both ATOM and SEQ_SUCCESS which are both fully completed query responses by the time we receive them over the wire. Both response types can be directly built into T.

SEQ_SUCCESS is actually a _completed_ Cursor response. Instead of going through all the drama of creating a Cursor (when a Cursor actually is not needed), RunResult<T> simply builds T off the completed Cursor response without the drama. This works, as it seems, up to about 100K rows. (I should probably document this).

When the server decides the initial response is "too large" to send down the wire, the server will switch the response type from SEQ_SUCCESS to SEQ_PARTIAL; in which case, you'll need full Cursor<T> support in order to consume the full sequence. RunResult<T> won't work anymore and hence the error message:

The query response cannot be converted to an object of T or List<T>.
This run helper works with SUCCESS_ATOM or SUCCESS_SEQUENCE results.
The server response was SUCCESS_PARTIAL. If the server response can be 
handled by this run method try converting to T or List<T>. Otherwise, if the 
server response cannot be handled by this run helper use another run
helper like `.RunCursor`.

So, technically, it's not a bug, but more a documentation issue? I hope that clarifies the behavior you're seeing. Would you be satisfied if I were to document this more clearly?

:crescentmoon: :stars: [**"Nothing good happens past 2am..."**_](https://www.youtube.com/watch?v=VOlcDBXKhSU)

stefanprodan commented 8 years ago

A reference to this in docs would be great.

So RunResult should be used only on tables that are guaranteed to stay under 100k. Is the optimization worth it? Is RunCursor slower on tables under 100k records then RunResult?

stefanprodan commented 8 years ago

Another thing that's bothering me, is that the server is sending down the wire all 150k records when I want only 50. I've specified Limit(50) why is the server sending more?

bchavez commented 8 years ago

Hi @stefanprodan ,

I've hopefully improved the exception message we display to the developer when the situation arises.

I've also added a warning in the documentation here: https://github.com/bchavez/RethinkDb.Driver/wiki/Run-Helpers#warning

As for the Limit(50) issue, I'm not sure what's going on with that... I'd have to investigate this a bit more.

Do you have a simplified unit test that exhibits the behavior that I could try to use?

Thanks, Brian

:zap: :runner: _"I was struck by lighting. Walkin' down the street..."_

bchavez commented 8 years ago

So RunResult should be used only on tables that are guaranteed to stay under 100k. Is the optimization worth it?

Yeah, that would be the intended use. I would say it's more out of convenience more than an optimization. There are several places in codes where I expect results to be less than 100K.

Is RunCursor slower on tables under 100k records then RunResult?

I don't think you'd see the same a severe performance issue (unlike dynamic) with RunCursor other than more allocations in the CLR heap carrying the weight of what Cursor contains here.

However, you do have a little more responsibility to Close Cursor<T> when you are done to remove it from the Connections cursor cache; otherwise, you could run into a memory leak issue if you don't Close or Dispose of a cursor properly.

:crown: :gem: _"I know everything that shine ain't always gold..."_

stefanprodan commented 8 years ago

The documentation looks great :+1:

So this usage will cause a memory leak:

public dynamic Search(string query, int limit)
{
    var conn = _rethinkDbFactory.CreateConnection();
    var logs = R.Db(_rethinkDbFactory.GetOptions().Database)
        .Table("Logs")
        .OrderBy()[new { index = R.Asc(nameof(LogEntry.Timestamp)) }]
        .Filter(t => t.CoerceTo("string").Match($"(?i){query}"))
        .Limit(limit)
        .RunCursor<LogEntry>(conn);

    return logs;
}

I suppose I have to do:

public dynamic Search(string query, int limit)
{
    var conn = _rethinkDbFactory.CreateConnection();
    var logs = R.Db(_rethinkDbFactory.GetOptions().Database)
        .Table("Logs")
        .OrderBy()[new { index = R.Asc(nameof(LogEntry.Timestamp)) }]
        .Filter(t => t.CoerceTo("string").Match($"(?i){query}"))
        .Limit(limit)
        .RunCursor<LogEntry>(conn);

    var result = logs.ToList();
    logs.Close();

    return result;
}

Thank you. I will investigate more the Limit problem and make a new issue.

bchavez commented 8 years ago

I don't know if the first will cause a memory leak because I can't see how the cursor logs is being consumed.

The second won't for sure, because I think ToList() is a LINQ method and LINQ will automatically call Dispose (also, you'd be consuming the cursor fully) ... but it's really inefficient if you're using .ToList() on items over 100K.

IIRC, as long as

Then you should be fine.

If you do not consume the cursor fully (breaking midway) on a partially responded cursor and you don't close, this will run you into a memory leak.