Teradata / PyTd

A Python Module to make it easy to script powerful interactions with Teradata Database in a DevOps friendly way.
MIT License
108 stars 43 forks source link

Memory Allocation Error #76

Open jamesmf opened 7 years ago

jamesmf commented 7 years ago

Is this error familiar to anyone?

[HY001] [Teradata][ODBC Teradata Driver] Memory allocation error

It occurs sometimes when I try to bulk insert using the python TD module, but I haven't been able to reliably reproduce it.

I've seen it referenced on Stack Overflow and on the TD Forums.

escheie commented 7 years ago

Here is the response from the Teradata ODBC support team.

[HY001] [Teradata][ODBC Teradata Driver] Memory allocation error This exception is a type of std::bad_alloc exception and is thrown when the operator new and operator new[] fail to allocate the requested storage space.

Unfortunately, there is nothing conclusive that can be determined from the errors, ODBC trace could shed more light on this as to why or where specifically these issues occur. A test case to re-create the issue would immensely help as well.

From looking into the std:bad_alloc exception, there are several reasons why this could occur:

1) When memory resources are low. Could result if system memory is limited or when using 32-bit versions of software. 2) If the memory becomes too fragmented to allocate new blocks 3) If there is a software defect that corrupts the size of the memory block that is requested to be allocated.

escheie commented 7 years ago

ODBC team determined root cause for this issue:

Analysis:

Both errors "Memory allocation error" and "SQL request with parameters exceeds maximum allowed length of 1 MB" occurs due to the same reason, which is when the Length of the Request Message size exceeds the maximum allowed Message size. Maximum allowed Message size is 1MB for versions 15.10.01.xx and 7MB for versions 16.xx.

The error "SQL request with parameters exceeds maximum allowed length of 1 MB" occurs when the bound parameters exceed the 1MB limit.

"Memory allocation error" occurs when the bound parameters are very close to the 1MB limit, however the 1 MB limit is not exceeded. This error actually occurs when ODBC driver tries to add KeepResp parcel to the outgoing message. Addition of this parcel is what pushes the outgoing message size over the 1MB limit, which is when the driver throws the confusing OS_AllocFailure exception.

Resolution:

The ODBC driver should not throw an OS_AllocFailure() as it is confusing in this case, instead it should throw TDODBCException with error code 22001, such that it is consistent with the "SQL request with parameters exceeds maximum allowed length of 1 MB" error.

Workaround:

Reduce the size of bound parameters such that there is enough space for the KeepResp parcel to be added.

ODBC-17861 has been created to track this.

jamesmf commented 7 years ago

That is extremely helpful and matches what I've experienced so far (the ~1MB bound param size).

ausiddiqui commented 6 years ago

The server instance we have runs 15.x but the driver I use is TD ODBC for Mac OS X 16.20 and I am still limited to 1 MB for parameter arrays. The official driver documentation also suggests that Parameter Arrays have a 1 MB limit with the formula R <= 64 KB and T + N * R <= 1 MB, where R is row size in bytes, N is the parameter rows and T is the request text size in bytes.

Is there a configuration in iODBC Manager to get the message size up to 7 MB as you say that for 16.xx that is the upper bound. The command I am running is:

list_df = df.values.tolist() # where df is a pandas dataframe
dataTuple = tuple(tuple(i) for i in list_df)
size = 1000

executeStmt = """
INSERT INTO myDBname.myTable (
            col1,
            col2,
            col3,
            col4,
            col5)
        VALUES (?, ?, ?, ?, ?);  # I was informed this semicolon speeds the process up
"""

udaexec = teradata.UdaExec("TD Insert", "1.0", logConsole=False)

for i in range(1, int(df.shape[0] / size) + 1):
  lower = (i - 1) * size
  upper = i * size
  pArray = dataTuple[lower:upper]

  with udaexec.connect(method="odbc",
                                        system=system,
                                        username=username,
                                        password=password,
                                        autoCommit=True,
                                        driver="Teradata Database ODBC Driver 16.20") as session:
  session.executemany(executeStmt, (pArray), batch=True)

# One more some additional "with udaexec" statement and session.executemany statement to handle the last chunk of rows.
blackhat06 commented 5 years ago

@escheie I am using Teradata 16.20 for Mac, and it still has 1 MB limit on memory execution. It's really big pain point, while working with million of rows. It would be great if this can be at-least extended to 10MB, which help speed-up loading.

In worst case, we might need to switch to Postgres.

ausiddiqui commented 5 years ago

@blackhat06 we switched over to the Giraffez library (https://github.com/capitalone/giraffez) for python scripting interaction with Teradata and it has been almost 6-8 months of painless performance -- only bottleneck is initial time spent getting the ODBC libraries installed for macOS / Unix environments configured and setup.

It also supports bulk uploads and arbitrary code excecution (create tables or volatile tables, or drop tables, etc.).