denisenkom / pytds

Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation

MIT License

192 stars 53 forks source link

Added new method bulk_insert with more control over the column types … #85

Closed thihara closed 6 years ago

thihara commented 6 years ago

…and direct support for python objects as data instead of a .cvs file

To support column types and a csv file will require additional modifications to parse the types from the CSV reader to their proper types so they can be serialized.

codecov[bot] commented 6 years ago

Codecov Report

Merging #85 into master will decrease coverage by 0.05%. The diff coverage is 54.54%.

@@            Coverage Diff             @@
##           master      #85      +/-   ##
==========================================
- Coverage   93.49%   93.43%   -0.06%     
==========================================
  Files          25       25              
  Lines        7712     7721       +9     
==========================================
+ Hits         7210     7214       +4     
- Misses        502      507       +5

Impacted Files	Coverage Δ
src/pytds/tds_base.py	`95.12% <100%> (+0.01%)`	:arrow_up:
src/pytds/__init__.py	`92.69% <50%> (-0.63%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 432eef7...0ea0c87. Read the comment docs.

denisenkom commented 6 years ago

Can you add tests? Do you think it might be better to modify method copy_to, to extend semantic of columns parameter to accept either list of column names or list of column descriptors?

thihara commented 6 years ago

@denisenkom I can't work on tests right now. But I've tested it out manually and it's pushing data happily so far.

I did think about modifying the existing method, but didn't for two reasons.

The method already have a large number of parameters and I didn't want to add another complicated one.
If column types were to be accepted, the data will need to be parsed into the respective types from the csv file, otherwise serializers will fail. That is a lot more work, I'm willing to work on it, but don't have the time to do it right now, and I needed bulk insert working ASAP.

If you can create a ticket for that and assign to me, I'll work on it in a few weeks.

thihara commented 6 years ago

@denisenkom What's preventing this from being accepted? I'd like to start building from the original repo instead of my fork.

If you have specific requirements that haven't been met by this pull request, I can see if they can be worked on within a few weeks.

denisenkom commented 6 years ago

Here is what I would like you to do:

Merge bulk_insert method into copy_to method. In copy_to method you can detect if file parameter is file like object and if so use old behavior, if not treat it as an iterable like in bulk_insert method. For columns method you can detect if elements of array are already instances of Column type, in which case follow new behavior, if otherwise use old behavior.
Add a simple test to exercise new behavior.

LHCGreg commented 6 years ago

Since I need this functionality as well, I took my own stab at it that meets @denisenkom's requests. It reuses the existing columns parameter and checks if values are Column objects. However I chose to add a new parameter for passing data as python values instead of csv/tsv for a couple reasons. One is that calling a sequence of python values "file" is a poor name. The other, arguably more important, reason is that there may be users depending on being able to provide Iterable[Iterable[str]] for file in order to stream data from memory on the fly without having to write out to disk first. That's what I was going to do until I ran into not being able to bulk insert into nvarchar(max) columns due to everything being typed as nvarchar(4000) by default. Assuming file is python values instead of csv/tsv if it's not a file-like object would break consumers of pytds using it in that way.

denisenkom commented 6 years ago

I believe recent changes in current master branch addressed all requests from this PR. Closing it.