Allow specifying column types and providing data directly in bulk copies

denisenkom / pytds

Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation

MIT License

192 stars 53 forks source link

Allow specifying column types and providing data directly in bulk copies #97

Closed LHCGreg closed 6 years ago

LHCGreg commented 6 years ago

This allows inserting values of types not compatible with nvarchar(4000), such as nvarchar(max).

This reuses the existing columns parameter and checks if values are Column objects. However I chose to add a new parameter for passing data as python values instead of csv/tsv for a couple reasons. One is that calling a sequence of python values "file" is a poor name. The other, arguably more important, reason is that there may be users depending on being able to provide Iterable[Iterable[str]] for file in order to stream data from memory on the fly (writing one csv/tsv row at a time using yield) without having to write out to disk first. That's what I was going to do until I ran into not being able to bulk insert into nvarchar(max) columns due to everything being typed as nvarchar(4000) by default. Assuming file is python values instead of csv/tsv if it's not a file-like object would break consumers of pytds using it in that way.

Fixes #84

gizmo93 commented 6 years ago

I tested your changes locally (SQL Server 2017), because at the moment i really need the support to insert interables using bulk insert. Works great, except for varchar(max), nvarchar(max) columns. Sometimes i get the error message like in the failed AppVeyor build above, sometimes i get "String or binary data would be truncated." although its a MAX column (strings are less than 3000 chars in my case) and the strings don't contain any chars my latin1 collated database could not handle.

LHCGreg commented 6 years ago

@gizmo93 How about if you apply this change against the latest master? Critically, one of my other PRs (https://github.com/denisenkom/pytds/pull/95) that has been merged fixes a bug with bulk inserting values as varchar(max) and nvarchar(max). That's what's causing "premature end-of-message was encountered". Is there a way to rerun the appveyor tests against latest master + this PR? I am not sure if that bug could also cause "String or binary data would be truncated". You should definitely not still get "premature end-of-message was encountered" when inserting varchar(max)/nvarchar(max) if you're running pytds with https://github.com/denisenkom/pytds/pull/95 included. Let me know if you still get "String or binary data would be truncated." and include a test case if you can.

@denisenkom https://github.com/denisenkom/pytds/pull/96 should have tests run again because the tests depend on https://github.com/denisenkom/pytds/pull/95 in order to pass. Then after that, this PR should be retested because the tests depend on https://github.com/denisenkom/pytds/pull/96 to pass.

denisenkom commented 6 years ago

Looks like it is still failing on latest master, I will try to revert for now.

LHCGreg commented 6 years ago

@denisenkom Tests still fail because they rely on https://github.com/denisenkom/pytds/pull/96 being merged in order to fix multiple nvarchar(max) values in result sets, because the new test for this change queries for multiple rows that have an nvarchar(max) column. Please run tests for and merge #96 first.