Azure / hdinsight-phoenix-sharp

.NET driver for Apache Phoenix and Phoenix Query Server
Apache License 2.0
22 stars 17 forks source link

PrepareRequest "Parameter Unbound" Error (Join Only Takes Params from One Table) #7

Open brycehodson opened 8 years ago

brycehodson commented 8 years ago

When we try to use parameters in a more-complicated query (we think specifically when we use joins), our code fails at PrepareRequestAsync(), returning an exception that a parameter is unbound- which to us seems very strange, since we don't want to bind any parameters until we're going to call ExecuteRequestAsync(). We assume that we are doing something wrong in our code, but if that is not the case, we suspect this is a problem in Phoenix or even Calcite. We found this (identical?) problem on the Hortonworks Community forum, unresolved: https://community.hortonworks.com/questions/45896/problem-executing-joins-on-phoenix-query-server.html. It links to [CALCITE-1052]- I can't tell if this is really related- at the very least, the connectionId in our error message matches the connectionId of our PrepareRequest, which it sounds like wasn't true for them. Regardless, their issue was apparently solved by updating to Phoenix 4.7, and I believe we are running 4.4, so if someone with a newer version can confirm that this isn't reproducible in 4.7, I'll make the requests to get our install updated to that.

Table Setup:

DROP TABLE IF EXISTS ShippingCodeFact; CREATE TABLE IF NOT EXISTS ShippingCodeFact ( CodeId DECIMAL (18, 0), StringData VARCHAR, CodeKey BIGINT NOT NULL, CustomerId VARCHAR, StartDate TIMESTAMP, EndDate TIMESTAMP, CONSTRAINT my_pk PRIMARY KEY(CodeId, StringData, CodeKey) ) SALT_BUCKETS=32, COMPRESSION='GZ'; DROP TABLE IF EXISTS ShippingGlobalCodes; CREATE TABLE IF NOT EXISTS ShippingGlobalCodes ( GlobalCode VARCHAR, InternalCode VARCHAR, CodeName VARCHAR, CodeKey BIGINT NOT NULL CONSTRAINT my_pk PRIMARY KEY(GlobalCode, InternalCode, CodeKey) ) SALT_BUCKETS=32;

Code:

var client = new PhoenixClient(null);
var random = new Random();
string connId = "0123456789abcdef";
connId = new string(Enumerable.Repeat(connId, 8).Select(s => s[random.Next(s.Length)]).ToArray());

RequestOptions options = RequestOptions.GetVNetDefaultOptions();
options.AlternativeHost = HOST_SERVER;

OpenConnectionResponse openConnResponse = null;

try
{
    Google.Protobuf.Collections.MapField<string, string> info = new Google.Protobuf.Collections.MapField<string, string>();
    openConnResponse = client.OpenConnectionRequestAsync(connId, info, options).Result;

    ConnectionProperties connProperties = new ConnectionProperties
    {
        HasAutoCommit = true,
        AutoCommit = false,
        HasReadOnly = true,
        ReadOnly = false,
        TransactionIsolation = 0,
        Catalog = "",
        Schema = "",
        IsDirty = true
    };
    client.ConnectionSyncRequestAsync(connId, connProperties, options).Wait();

    string cmdText = "SELECT fact0.CustomerId FROM ShippingCodeFact fact0 INNER JOIN ShippingGlobalCodes gcD1 ON fact0.StringData = gcD1.InternalCode WHERE fact0.StartDate < ? AND fact0.EndDate >= ? AND gcD1.GlobalCode = ?";

    var paramList = new Google.Protobuf.Collections.RepeatedField<TypedValue>();
    TypedValue v1 = new TypedValue
    {
        NumberValue = ((new DateTime(2016, 7, 26)).Ticks - (new DateTime(1970, 1, 1)).Ticks)/10000,
        Type = Rep.JAVA_SQL_TIMESTAMP
    };
    TypedValue v2 = new TypedValue
    {
        NumberValue = ((new DateTime(2016, 4, 26)).Ticks - (new DateTime(1970, 1, 1)).Ticks)/10000,
        Type = Rep.JAVA_SQL_TIMESTAMP
    };
    TypedValue v3 = new TypedValue
    {
        StringValue = "87603000",
        Type = Rep.STRING
    };
    paramList.Add(v1);
    paramList.Add(v2);
    paramList.Add(v3);

    PrepareResponse prepareResponse = client.PrepareRequestAsync(connId, cmdText, ulong.MaxValue, options).Result;
    StatementHandle statementHandle = prepareResponse.Statement;
    ExecuteResponse execResponse = client.ExecuteRequestAsync(statementHandle, paramList, ulong.MaxValue, true, options).Result;
    FetchResponse fetchResponse = null;
    if(!execResponse.Results[0].FirstFrame.Done)
        fetchResponse = client.FetchRequestAsync(connId, prepareResponse.Statement.Id, 0, uint.MaxValue, options).Result;
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
}
finally
{
    if (openConnResponse != null)
    {
        client.CloseConnectionRequestAsync(connId, options).Wait();
        openConnResponse = null;
    }
}

Exception "errorMessage":

RuntimeException: java.sql.SQLException: ERROR 2004 (INT05): Parameter value unbound Parameter at index 3 is unbound -\u003e SQLException: ERROR 2004 (INT05): Parameter value unbound Parameter at index 3 is unbound

(If anyone wants I can paste the whole "exceptions" value- most of it is just the trace)

@duoxu since you've seemed interested in the other issues

duoxu commented 8 years ago

@brycehodson

I also found that post in Hortonworks community, Based on that post, here is my findings,

SELECT fact0.CustomerId FROM ShippingGlobalCodes gcD1 INNER JOIN ShippingCodeFact fact0 ON fact0.StringData = gcD1.InternalCode WHERE gcD1.GlobalCode = ?

SELECT fact0.CustomerId FROM ShippingCodeFact fact0 INNER JOIN ShippingGlobalCodes gcD1 ON fact0.StringData = gcD1.InternalCode WHERE fact0.StartDate < ? AND fact0.EndDate >= ?

Both of above works, the pattern looks like if A INNER JOIN B, in where statement, you can use A's fields, but not B's fields, I am not sure if this is the Phoenix syntax, I will confirm with Hortonworks.

brycehodson commented 8 years ago

@duoxu

So to clarify, this is just a limitation of Phoenix's SQL implementation? If I try to isolate the join to a subquery and then select I still get an error (like SELECT id FROM (SELECT * FROM .... JOIN ... ON ...) WHERE ...)- do you have any ideas as to if there's some other best way to work around this issue?

duoxu commented 8 years ago

@brycehodson

I ran the INNER JOIN query in PrepareAndExecuteRequest as well as sqlline.py on the cluster, both were executed successfully. So Phoenix should have supported this syntax. The error is from Calcite/Avatica when we executed PrepareRequest, I think the bug is from there.

The Hortonworks dev owner of PQS is out of office and I will contact him again next Monday.

BTW, when you executed your subquery, did that show the same stack trace or different? Could you provide me a sample query?

brycehodson commented 8 years ago

@duoxu The part of the traces captured in the error message were identical, but the subquery version looks like this:

"SELECT CustomerId FROM (SELECT * FROM ShippingCodeFact fact0 INNER JOIN ShippingGlobalCodes gcD1 ON fact0.StringData = gcD1.InternalCode) WHERE StartDate < ? AND EndDate >= ? AND GlobalCode = ?"

I also looked at doing a query like this: "SELECT CustomerId FROM (SELECT CustomerId, StringData FROM ShippingCodeFact fact0 WHERE fact0.StartDate < ? AND fact0.EndDate >= ?) as tbl1 INNER JOIN (SELECT InternalCode FROM ShippingGlobalCodes gcD1 WHERE GlobalCode = ?) as tbl2 ON tbl1.StringData = tbl2.InternalCode" but I get again the same exception (with the same trace)

I was hoping that separating the filtering from the join would avoid the problem, but based on the results I'm assuming that somewhere before it gets to the bug, Phoenix is translating/optimizing these queries into something equivalent to the original version we tried.