denisenkom / pytds

Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation
MIT License
192 stars 53 forks source link

Bulk insert - copy_to #61

Closed cristianocoelho closed 6 years ago

cristianocoelho commented 7 years ago

I can't seem to make this to work, looking at the code there are two issues.

1- The table name is formatted between "[ ]" which means you can only send dbo tables and not able to specify any schema

2- For some reason all the column types inserted are nvarchar(4000), and this is probably the reason I'm failing at inserting some int and date columns, I keep getting: OperationalError: Invalid column type from bcp client for colid 1.

Are there any undocumented limitations I am missing?

cristianocoelho commented 7 years ago

Figured out the issue, the code always sends nvarchar nullable, if the target table expects a NOT NULL data type, it will crash.

Proposed changes: Find out if column is nullable or not and send the appropriate flag. Also, do not use brackets for the table/view name so schema names can be used.

conn = self._conn()
import csv
reader = csv.reader(file, delimiter=sep)
if not columns:
    self.execute('select top 1 * **from {}** where 1<>1'.format(table_or_view))  # CRISTIANO: Remove [ ] so we can use schemas
    columns = [(col[0], col[6]) for col in self.description]

**# CRISTIANO: Set nullable flag based on col info since it is needed for bulk insert to work correctly
metadata = []
for col in columns:
    if col[1]:
        metadata.append(Column(name=col[0], type=conn._conn.NVarChar(4000), flags=Column.fNullable))
    else:
        metadata.append(Column(name=col[0], type=conn._conn.NVarChar(4000)))**

A few lines below...

operation = 'INSERT BULK {0}({1}) {2}'.format(table_or_view, col_defs, with_part) # CRISTIANO: Remove [ ] so we can use schemas

If someone can review these changes and make it part of the library it would be great.

denisenkom commented 7 years ago

Should be fixed now https://github.com/denisenkom/pytds/commit/4c31b9d480ac7004a7f5d39a47d8cc8d65c019c8 https://github.com/denisenkom/pytds/commit/782c935d593c2f314a5ac1f0703fb9a0f68307f4

Thanks for contribution

cristianocoelho commented 7 years ago

Sorry, I have found another issue. Declaring all values as nvarchar is a bit complicated, since values such as empty strings will be casted to the default float, if the target is a float, or default date, if the target is a date, ignoring the keep_nulls flag.

Edit: I'm not sure if it would be the best approach but it's working for me, basically define a new NVarChar type (subclasses NVarChar70 or 72) that adds an additional check for empty strings to also send nulls on both cases, nulls and empty strings, this way the behaviour is the same as the original BCP command which basically always treats empty strings as nulls.