google-code-export / ibm-db

Automatically exported from code.google.com/p/ibm-db
1 stars 0 forks source link

Memory leak in ibm_db.execute(prepared_statement, bind_parameters) #169

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

We have code that reads a table from one databases and inserts the same tuples 
into another datase (second connection) with a prepared statement.  After 
copying a few tens of thousands of rows, memory grows already by several MB

The code looks approximately as follows:
def copy_table(src_conn, src_tbname, dst_conn, dst_tbname, 
commit_interval=1000):
    # ... query source table ... omitted ...

    # compute an sql statement of the form insert into table(cols,...) values(?,...)
    n=ibm_db.num_fields(stmt)
    sql_names=','.join([ibm_db.field_name(stmt, i) for i in range (0, n)])
    sql_vals='?,'*(n-1)+'?'
    sql=("insert into %s(%s) values(%s)") % (dst_tbname, sql_names, sql_vals)

    # create a prepared statement from the above string
    sql_prep=ibm_db.prepare(dst_conn, sql)

    row_err=0
    row_i=0
    while True:
        # fetch tuple from source table, break if we are at the end
        row=ibm_db.fetch_tuple(stmt)
        if type(row) is not tuple: break

        # execute the prepared statement and bind the params with the above tuple
        # this has a SERIOUS memory leak, process grew to 60M after 20k rows
        res = ibm_db.execute(sql_prep, row)
        if res == False: row_err+=1
        row_i+=1
        if row_i%commit_interval==0:
            res=ibm_db.commit(dst_conn)
        sys.stdout.write("%d/%d, e=%d\r" % (row_i, row_count, row_err))
    ibm_db.commit(dst_conn)
    print
    return row_count

What is the expected output? What do you see instead?

Process grows (35M per 20k rows in our case), and on a big table (after about 
1.6mio rows), process is terminated by OS due to out of memory

There are some workarounds:

WORKAROUND1
Manually binding the parameters of the prepared statement from within python 
improves but does not solve the situation (leak of 12M per approx 200k rows, 
25MB is base line memory usage of our process):
        # small memory leak, process grew to 37M after 200k rows
        for i in range (0, n):
            ibm_db.bind_param(sql_prep, i+1, row[i])
        res = ibm_db.execute(sql_prep)

WORKARUND2
Replacing the prepared statement with a generated SQL insert statement string 
that is simply executed with
ibm_db.execute_immediate('insert into table(col1, col2, ...) values(val1, val2, 
...)')
does not seem to exhibit any memory leak, mem usage was constant for the first 
approx 100-200k rows.

What version of the product are you using? On what operating system?
ibm_db-2.0.5.1 on Ubuntu 14.04.1LTSLinux (64 bit)

Please provide any additional information below.
We peaked quickly into the ibm_db.c code.  It seems that there is code that 
caches previously bound parameters (_python_ibm_db_execute_helper1) in the
if ( !NIL_P(parameters_tuple) ) {
block which also executes subsequently helper2, with 1 as bind_params argument.

As mentined above in the case the parameters are bound from python (else block 
in ..._helper1, and 0 passed as bind_params argument to ..._helper2), there is 
still a memory leak but it is only "mild" in comparison

Hope this helps.

Original issue reported on code.google.com by Th.Gschw...@gmail.com on 12 Feb 2015 at 3:01

GoogleCodeExporter commented 9 years ago
Thanks for reporting this issue, will check this and let you know once I found 
something, or will ask you for more detail if I would not able to regenerate 
the issue

Original comment by rahul.pr...@in.ibm.com on 19 Feb 2015 at 6:30

GoogleCodeExporter commented 9 years ago
I didn't see any memory leak.. could you please give me the table DDL .. might 
be for some specific datatype there would be some memory leak

Original comment by rahul.pr...@in.ibm.com on 19 Feb 2015 at 9:48