denisenkom / pytds

Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation
MIT License
190 stars 52 forks source link

copy_to gets IndexError: list index out of range #142

Closed j1yuan closed 1 year ago

j1yuan commented 1 year ago

I am using the latest version (1.11.0).

The target table is created by:

CREATE TABLE [dbo].[PrescreenRecords_Stage](
    [BrandId] [int] NOT NULL,
    [LenderId] [int] NOT NULL,
    [NameKey] [nvarchar](20) NOT NULL,
    [NameKey2] [nvarchar](20) NOT NULL,
    [StartDate] [datetime2](7) NOT NULL,
    [EndDate] [datetime2](7) NOT NULL,
    [EnrollmentCode] [nvarchar](10) NULL,
    [RecordJson] [nvarchar](max) NOT NULL,
 CONSTRAINT [PK_dbo_PrescreenRecords_Stage] PRIMARY KEY CLUSTERED 
(
    [BrandId] ASC,
    [LenderId] ASC,
    [StartDate] ASC,
    [NameKey] ASC,
    [NameKey2] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

In my code, I create the CSV file on the flight, then, use the copy_to() function to load.

data_dict = {'BrandId':[4, 4, 4, 4, 4], 
            'LenderId':[5, 5, 5, 5, 5], 
            'NameKey':['A1', 'B1', 'C1', 'D1', 'E1'],
            'NameKey2':['A12', 'B12', 'C12', 'D12', 'E12'],
            'StartDate':['2022-09-21 08:15:40.1700000', '2022-09-22 08:15:40.1700000', '2022-09-23 08:15:40.1700000', '2022-09-24 08:15:40.1700000', '2022-09-25 08:15:40.1700000'],
            'EndDate':['2022-09-21 09:15:40.1700000', '2022-09-22 09:15:40.1700000', '2022-09-23 09:15:40.1700000', '2022-09-24 09:15:40.1700000', '2022-09-25 09:15:40.1700000'],
            'EnrollmentCode':['AE', 'BE', 'CE', 'DE', 'EE'],
            'RecordJson':["{'CAMPAIGNDESCR': 'EOSDirectMail1', 'ENROLLMENTCODE': 'DMST101', 'STATEID': 'AL', 'ZIPCODE': '36104'}",
            "{'CAMPAIGNDESCR': 'EOSDirectMail2', 'ENROLLMENTCODE': 'DMST101', 'STATEID': 'AL', 'ZIPCODE': '36104'}",
            "{'CAMPAIGNDESCR': 'EOSDirectMail3', 'ENROLLMENTCODE': 'DMST101', 'STATEID': 'AL', 'ZIPCODE': '36104'}", 
            "{'CAMPAIGNDESCR': 'EOSDirectMail4', 'ENROLLMENTCODE': 'DMST101', 'STATEID': 'AL', 'ZIPCODE': '36104'}", 
            "{'CAMPAIGNDESCR': 'EOSDirectMail5', 'ENROLLMENTCODE': 'DMST101', 'STATEID': 'AL', 'ZIPCODE': '36104'}"]
            }
df = pd.DataFrame(data_dict)
print("df:", df)

print("df columns:", df.columns)

file_path = "/tmp/bulk_test.csv"
print("file_path:", file_path)

df.to_csv(file_path, sep='|', index=False, header=False, quoting=None)
print("created file:", file_path)

cur.copy_to(file_path, 'PrescreenRecords_Stage', sep='|', columns=('BrandId','LenderId','NameKey','NameKey2','StartDate','EndDate','EnrollmentCode','RecordJson'))

It is run on a linux server, and the error message is IndexError: list index out of range

I wonder if someone can help me?

j1yuan commented 1 year ago

Here is everything that I can see on the terminal screen:

[s_dev_infsvc@devbisedge01 tmp]$ ./test_df_bulk.py
Connected.
Created cursor.
df:    BrandId  LenderId NameKey  ...                      EndDate EnrollmentCode                                         RecordJson
0        4         5      A1  ...  2022-09-21 09:15:40.1700000             AE  {'CAMPAIGNDESCR': 'EOSDirectMail1', 'ENROLLMEN...
1        4         5      B1  ...  2022-09-22 09:15:40.1700000             BE  {'CAMPAIGNDESCR': 'EOSDirectMail2', 'ENROLLMEN...
2        4         5      C1  ...  2022-09-23 09:15:40.1700000             CE  {'CAMPAIGNDESCR': 'EOSDirectMail3', 'ENROLLMEN...
3        4         5      D1  ...  2022-09-24 09:15:40.1700000             DE  {'CAMPAIGNDESCR': 'EOSDirectMail4', 'ENROLLMEN...
4        4         5      E1  ...  2022-09-25 09:15:40.1700000             EE  {'CAMPAIGNDESCR': 'EOSDirectMail5', 'ENROLLMEN...

[5 rows x 8 columns]
df columns: Index(['BrandId', 'LenderId', 'NameKey', 'NameKey2', 'StartDate', 'EndDate',
       'EnrollmentCode', 'RecordJson'],
      dtype='object')
file_path: /tmp/bulk_test.csv
created file: /tmp/bulk_test.csv
Traceback (most recent call last):
  File "./test_df_bulk.py", line 45, in <module>
    cur.copy_to(file_path, 'PrescreenRecords_Stage', sep='|', columns=('BrandId','LenderId','NameKey','NameKey2','StartDate','EndDate','EnrollmentCode','RecordJson'))
  File "/home/s_dev_infsvc/.local/lib/python3.6/site-packages/pytds/__init__.py", line 1003, in copy_to
    self._session.submit_bulk(metadata, rows)
  File "/home/s_dev_infsvc/.local/lib/python3.6/site-packages/pytds/tds.py", line 1090, in submit_bulk
    serializers[i].write(w, row[i])
IndexError: list index out of range
j1yuan commented 1 year ago

I realized that I need to create Column objects and use them in the columns parameter. I'll close this issue.