Teradata / PyTd

A Python Module to make it easy to script powerful interactions with Teradata Database in a DevOps friendly way.
MIT License
108 stars 43 forks source link

Issue consuming CLOB in python returned by Teradata SP #32

Open AsimNaveed-TD opened 8 years ago

AsimNaveed-TD commented 8 years ago

Hi Eric, Before putting my query I would like to thank you for your support and time you had given to us. I would like to share some more information and abnormility we faced regarding Teradata Python Module because this will make it more mature. From the start we got issue’s in consuming Clob as output parameter of Teradata Stored Procedure, while most of them addressed with your help and guidance such as “ truncation issue is resolved with usage of ‘size’ parameter” in “Teradata.OutParam” clause. But there are still some concerns and behavioral inconsistencies we found and wish to bring them into your knowledge, so they can be checked and rectified.

  1. We are calling Teradata stored procedure using python udaExec module and SP returns four output parameters and Clob is one of it. SP size of Clob is 2000,000,000 (2GB) and can return up to 2GB return text/script. Now in python we tried to consume as much as we could using python ‘size’ parameter of "teradata.OutParam", which advocates python about the buffer size for clob and we tried different size values and with that we got some abnormal behaviour. Behaviour: More important thing we experienced is while retrieving clob through “Teradata.OutParam” using size parameter that most of time we get the complete result including clob of any size (10MB,100MB,500MB etc) but it sometime returns NULL in output, however do not give any error/exception and the execution went succesful . Python SP CALL and the culprit clob is highlighted below,

results = session.callproc("GDEV1P_FF.GCFR_FF_TPTLoadCLOB_Generate",("LD_668_66_Customer",6,17,0,"localhost","dbc","dbc",teradata.OutParam("oCode"),teradata.OutParam("oMessage"),teradata.OutParam("oScript",dataType="CLOB", size=4999999999),teradata.OutParam("oParams"),teradata.OutParam("oLogon")))

And when we retrieve like => results.oScript (The oScript is NULL). The python log showed successfull execution of SP and we can retrieve the other output parameter like => results.oMessage , results.oCode

We checked our SP and it is working fine, this NULL behaviour often happen with clob with higher sizes like 180 MB, 300MB or >500MB clob has to be returned. But this NULL behaviour is not retricted to higher sizes it also happened for smaller sizes as well.

Second we observed the buffer size restriction for python module,

For Python 32-bit installed at 64-bit windows, we could only consume 178 MB in size parameter of Teradata.OutParam. For Clob contents greater than 178 MB will be truncated. If we try to set “size” greater than 178 MB we got “MemoryError”. "MemoryError" issue disappeared with Python 64 bit and we were able to set the “size” up to 500MB and but still not able to get the 2GB as output and for 2GB we get NULL behaviour in output. TD Python module log showed successful execution of SP but the CLOB return parameter was NULL. This mostly happen when ‘size’ parameter is set to a higher value like 180MB or greater but this also happened for small sizes as This behavior puts a question mark on python module regarding CLOB consumption and we were about to incorporate this module in our TD project. Do you experienced this behaviour as well? Is there any bug reported for such behaviour. Please check and help us resolve the issue because we were unable to find the root cause. Thanks in advance, we are really grateful for your help, support and time.

Regards Asim Naveed

AsimNaveed-TD commented 8 years ago

The Environment's details for CLOB consumption issue as follows.

Configuration 1: With Python 3.5.1 (32 & 64 bit) Python Version: 3.5.1 Python Compiler: MSC v.1900 32 bit (Intel) & MSC v.1900 64 bit (AMD64) Python Build: ('v3.5.1:37a07cee5969', 'Dec 6 2015 01:54:25') UdaExec Version: 15.10.0.11 Platform's: Windows-7-6.1.7601-SP1 RAM: 8 GB

Configuration 2: With Python 3.5.1 (64 bit) Python Version: 3.5.1 Python Compiler: GCC 4.3.4 [gcc-4_3-branch revision 152973 Python Build: ('default', 'Feb 9 2016 04:34:46') UdaExec Version: 15.10.0.11 Platform's: Linux-2.6.32.54-0.23.TDC.1.R.2-default-x86_64-with-SuSE-11-x86_64 (VM) RAM: VM allocated 3 GB (total 8 GB RAM)

escheie commented 8 years ago

Hi Asim,

I started investigating the issue and will try to get to the bottom it. I will be out traveling though so I won't be able to complete my investigation until I get back middle of next week.

Thanks, -Eric

escheie commented 8 years ago

Asim - Have you tried the ODBC OutputAsResultSet='Y' option as a workaround? Below is a working example. The benefit of this option is that you don't need to specify the size of the buffer up front. You will still need to load the entire CLOB into memory though. The Python Database API specification does not provide an interface for streaming columns values, so we would need to add an enhancement above and beyond what the standard API currently provides.

    def testProcedureWithLargeClob(self):
            with udaExec.connect(self.dsn, username=self.username,
                                 password=self.password, OutputAsResultSet="Y") as conn:
                val = "ABCDEFGHIJ" * 1100001 # Create a 10+ MB string
                conn.execute("""CREATE TABLE testTableWithLargeClob
                    (id INTEGER, c CLOB(20000000) CHARACTER SET UNICODE)""")
                conn.execute("INSERT INTO testTableWithLargeClob VALUES (?, ?)",
                             (1, val))
                for r in conn.execute("SELECT c, CHARACTER_LENGTH(c) AS length FROM testTableWithLargeClob"):
                    self.assertEquals(r.c, val)
                    print(r.length)
                for r in conn.execute(
                    """REPLACE PROCEDURE testProcedureWithLargeClob
                        (OUT p1 CLOB (20000000) CHARACTER SET UNICODE)
                        BEGIN
                            SELECT c INTO p1 FROM testTableWithLargeClob WHERE id = 1;
                        END;"""):
                    logger.info(r)
                cursor = conn.cursor()
                cursor.callproc("testProcedureWithLargeClob", (teradata.OutParam("p1", size=0), ))
                output = cursor.fetchone()
                self.assertEquals(output.p1, val)
AsimNaveed-TD commented 8 years ago

Hi Eric, Thanks for your reply.Yes our project require that in ODBC DSN we must set OutputAsResultSet='Y' and always keep this option checked. And the above issue is produced with this option checked in DSN over windows and linux both. Do by checking this option in DSN is the same thing or it must be given in udaExec.connect as workaround?

But however i have not tried this option through udaExec.connect, i will check this as well and let you know.

Regards Asim Naveed

AsimNaveed-TD commented 8 years ago

Hi Eric, I have tried OutputAsResultSet='Y' with udaExec.connect but this does not solved our problem and went into some more issues like,

  1. At windows using this option in connect and through con.cursor approach the python stop's working and hangs.
  2. At suse linux 11, i got "Segment fault" error for this parameter and con.cursors approach. But if i call SP through con.callproc with " OutputAsResultSet='Y' ",i get nothing in my SP output params. The result is off zero bytes.
  3. when i set " OutputAsResultSet='N' " with con.callproc approach the SP starts returning output but the size above stated issue and abnormal behavior observed.

We still looking towards you for the remedy and exploration for our problem.

Thanks and Regards Asim Naveed

escheie commented 8 years ago

I am unable to reproduce your problem using SQL-based SPs. The Python module uses standard ODBC calls to execute the queries you submit and there is not a Python module code path that would behave differently based on the size of an SP output parameter. Its hard to pin point or fix a problem that I can't reproduce or that isn't clearly specific to the Python module. Does your SP work consistently when using ODBC directly? If so, can you provide an example so I can compare? Are you using the latest Teradata ODBC driver version? Can you provide a test case that reproduces the problem?

Thanks, -Eric