findmypast-oss / mssql_ecto

Ecto adapter for Mssqlex
Apache License 2.0
49 stars 20 forks source link

Text type restricted to maximum length of 4000 #2

Open toddharding opened 7 years ago

toddharding commented 7 years ago

Expected Behavior

Text types defined in a migration should be able to store strings of a length greater than 4000.

Current Behavior

When data of greater than length 4000 is inserted the statement is terminated.

Possible Solution

Perhaps a look at the erlang ODBC driver?

Steps to Reproduce (for bugs)

test "insert lots" do
    string = 1..1200 |> Enum.reduce("", fn x, acc -> acc <> to_string(x) end)
    # :text is of unrestricted size type
    post = TestRepo.insert!(%Post{text: string})
    assert post.text == string
end

Your Environment

jbachhardie commented 7 years ago

This is definitely a limitation of / bug in the erlang ODBC driver. We tried to support unsized character fields but the erlang ODBC driver mangles the return from them, so we were forced to restrict size to the maximum sized type, wvarchar(4000).

alexandercarls commented 7 years ago

This should be the relevant issue for erlang-odbc. https://bugs.erlang.org/browse/ERL-132

Is there a workaround for retrieving an nvarchar(max) RTF string or other technical solution? This is currently a showstopper for me. Thanks!

jbachhardie commented 7 years ago

If the data is reliably under 4000 bytes a CAST(column AS nvarchar(4000)) on the select should work. If it's not I'm afraid you're out of luck until that issue in Erlang is resolved. I don't think the driver has any way of extracting large data that's working since the bug affects all variable length fields ☹️

We don't have any immediate plans to patch Erlang but if you'd like to give it a try I believe @toddharding from our team had given it a shot and identified at least the broad strokes of what the problem is.

alexandercarls commented 7 years ago

Thank you for the quick reply. Unfortunately that doesn't work for me. Those RTF strings are huge!

arcusfelis commented 6 years ago

The odbc application bug was fixed in a fork https://github.com/arcusfelis/eodbc odbc module became eodbc, and application name was also changed.

So, you can try to run with it. From minuses: eodbc requires unixodbc to compile C code part. While odbc is shipped with Erlang/OTP already compiled.

jbachhardie commented 6 years ago

Nice work! We'll upgrade and hopefully the unixodbc dependency isn't too much trouble for people.

cpursley commented 6 years ago

Has this been considered as a possible solution?

http://erlang-odbc-tips.colefichter.ca/

See "Support for NVARCHAR(MAX)" section.

arcusfelis commented 6 years ago

it uses standard odbc application, which has that length bug.

On Tue, 18 Sep 2018 at 22:14, Cháse Pursłey notifications@github.com wrote:

Has this been considered as a possible solution?

http://erlang-odbc-tips.colefichter.ca/

See "Support for NVARCHAR(MAX)" section.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/findmypast-oss/mssql_ecto/issues/2#issuecomment-422557552, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnDNOI7AuRvmN4gI5O2Q6vH1t6OpKMTks5ucWIegaJpZM4Mmltp .

-- С уважением, Уваров Михаил. Best regards, Uvarov Michael

cpursley commented 6 years ago

Thanks @arcusfelis

How would I go about installing and using your eodbc fork (debian system)?

And I assume I'd have to fork mssql_ecto to use eodbc instead of erlang-odbc?

arcusfelis commented 6 years ago
cpursley commented 6 years ago

Thanks @arcusfelis

I'm still not clear how I could get eodbc up and running with mssql_ecto / mssqlex.

I've forked mssqlex and replaced :odbc with :eodbc where (I beleive) appropriate.

Additionally, I'm installing eodbc in my Pheonix app as follows:

defp deps do
    [
      {:phoenix, "~> 1.3.4"},
      {:phoenix_pubsub, "~> 1.1.0"},
      {:phoenix_ecto, "~> 3.4.0"},
      {:mssql_ecto, git: "git://github.com/cpursley/mssql_ecto.git", branch: "c/eodbc", override: true},
      ...

Any other suggestions?

cpursley commented 6 years ago

Ok, I was able to get eodbc loaded by installing it via mix and swapping "odbc" for "eodbc" where appropriate in the mssqlex library. That approach works and it's pretty handy that you can install erlang rebar libraries easily via hex.

However, there are some esoteric errors coming up when running queries where the column type is either varchar(max) or nvarchar(max).

In iex:

{:ok, conn} = Mssqlex.ODBC.init(conn_str)

This works for nvarchar(max). However, the columns are returned as binary.

> :eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1')
=> {:selected,
     ['ID', 'SomeColumn'],
     [[1, <<72, 0, 52, 0, 115, 0, 73, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 50, 0, 49, 0, 89, 0, 87, ...>>]
      ]}

This does not work when varchar(max):

> :eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1')
=> [error] GenServer #PID<0.472.0> terminating
** (stop) {:port_exit, :killed}
Last message: {#Port<0.102>, {:exit_status, 139}}
State: {:state, #Port<0.102>, {#PID<0.389.0>, #Reference<0.425679474.514588673.206201>}, #PID<0.389.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.100>, #Port<0.101>], #Port<0.103>, #Port<0.104>}

And via Ecto/Mssqlex, neither varchar(max) nor nvarchar(max) work.

Error for nvarchar(max):

> SomeTable |> Repo.get(1)
=> [error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.62>'], 125], 10]
[error] GenServer #PID<0.378.0> terminating
** (stop) {:port_exit, :could_not_bind_data_buffers}
Last message: {#Port<0.58>, {:exit_status, 22}}
State: {:state, #Port<0.58>, {#PID<0.359.0>, #Reference<0.1246316755.1851523073.245020>}, #PID<0.359.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.54>, #Port<0.55>], #Port<0.61>, #Port<0.62>}

Error for varchar(max):

SomeTable |> Repo.get(1)

[error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.57>'], 125], 10]
[error] GenServer #PID<0.374.0> terminating
** (stop) {:port_exit, :killed}
Last message: {#Port<0.53>, {:exit_status, 139}}
State: {:state, #Port<0.53>, {#PID<0.358.0>, #Reference<0.2075734545.4001366017.31751>}, #PID<0.358.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.49>, #Port<0.50>], #Port<0.56>, #Port<0.57>}

It looks like there's two things that need to happen to get at least nvarchar(max) working:

Does this sound right @arcusfelis & @jbachhardie?

arcusfelis commented 6 years ago

Hmm, I haven't tested the library with binary_strings option off. Generally, you really want to use binaries everywhere for Garbage Collector efficiency.

In MongooseIM we use these connection options:

eodbc:connect(Settings, [{scrollable_cursors, off},{binary_strings, on},{return_types, on}]) https://github.com/esl/MongooseIM/blob/295181f7e9fc5c42b58fdb212ee12be8237cf447/src/rdbms/mongoose_rdbms_odbc.erl

binary_strings=on - return char, nvarchar, binary, nbinary as binaries. Actually, the otp version of odbc library has the option too. But than we can't seamlessly do some processing of unicode fields because both binaries and unicode are now encoded using the same erlang type, binary. We use utf8 binaries as strings in MongooseIM, but MSSQL uses utf16, so some reencoding is required.

So, basically we have added the return_types option and now each query returns not only field names, but also field types. And our code can guess which transformations we should do based on ODBC field types. https://github.com/esl/MongooseIM/blob/295181f7e9fc5c42b58fdb212ee12be8237cf447/src/rdbms/mongoose_rdbms_odbc.erl#L105

On Thu, 27 Sep 2018 at 20:15, Cháse Pursłey notifications@github.com wrote:

Ok, I was able to get eodbc loaded by installing it via mix and swapping "odbc" for "eodbc" where appropriate in the mssqlex library. That approach works and it's pretty handy that you can install erlang rebar libraries easily via hex.

However, there are some esoteric errors coming up when running queries where the column type is either varchar(max) or nvarchar(max).

In iex:

{:ok, conn} = Mssqlex.ODBC.init(conn_str)

This works for nvarchar(max). However, the columns are returned as binary.

:eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1') => {:selected, ['ID', 'SomeColumn'], [[1, <<72, 0, 52, 0, 115, 0, 73, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 65, 0, 50, 0, 49, 0, 89, 0, 87, ...>>] ]}

This does not work when varchar(max):

:eodbc.sql_query(conn, 'SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1') => [error] GenServer #PID<0.472.0> terminating ** (stop) {:port_exit, :killed} Last message: {#Port<0.102>, {:exit_status, 139}} State: {:state, #Port<0.102>, {#PID<0.389.0>, #Reference<0.425679474.514588673.206201>}, #PID<0.389.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.100>, #Port<0.101>], #Port<0.103>, #Port<0.104>}

And via Ecto/Mssqlex, neither varchar(max) nor nvarchar(max) work.

Error for nvarchar(max):

SomeTable |> Repo.get(1) => [error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.62>'], 125], 10] [error] GenServer #PID<0.378.0> terminating ** (stop) {:port_exit, :could_not_bind_data_buffers} Last message: {#Port<0.58>, {:exit_status, 22}} State: {:state, #Port<0.58>, {#PID<0.359.0>, #Reference<0.1246316755.1851523073.245020>}, #PID<0.359.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.54>, #Port<0.55>], #Port<0.61>, #Port<0.62>}

However, using Repo.query directly works for _nvarchar(max)) (but not varchar):

{:ok, result} = Repo.query("SELECT * FROM dbo.tblSomeTable AS st WHERE st.ID = 1")

Error for varchar(max):

SomeTable |> Repo.get(1)

[error] [79, 68, 66, 67, 58, 32, 114, 101, 99, 101, 105, 118, 101, 100, 32, 117, 110, 101, 120, 112, 101, 99, 116, 101, 100, 32, 105, 110, 102, 111, 58, 32, [123, ['tcp_closed', 44, '#Port<0.57>'], 125], 10] [error] GenServer #PID<0.374.0> terminating ** (stop) {:port_exit, :killed} Last message: {#Port<0.53>, {:exit_status, 139}} State: {:state, #Port<0.53>, {#PID<0.358.0>, #Reference<0.2075734545.4001366017.31751>}, #PID<0.358.0>, :undefined, :off, true, true, :on, :connected, :undefined, 0, [#Port<0.49>, #Port<0.50>], #Port<0.56>, #Port<0.57>}

It looks like there's two things that need to happen to get at least nvarchar(max) working:

  • Get odbc:param_query to submitt params correclty
  • Decoding binary type

Does this sound right @arcusfelis https://github.com/arcusfelis & @jbachhardie https://github.com/jbachhardie?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/findmypast-oss/mssql_ecto/issues/2#issuecomment-425192423, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnDNOiSoOUDnUeH455RgFTq7a3WksOvks5ufRW_gaJpZM4Mmltp .

-- С уважением, Уваров Михаил. Best regards, Uvarov Michael