epermana / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
1 stars 0 forks source link

Tungsten consumes excess memory due to improper access to byte-encoded queries #810

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Set up master/slave replication between MySQL instances with statement or 
mixed mode binary replication with default heap memory allocation (1024Mb). 
2. Put a load consisting of very large queries on the master.  This can be done 
by loading a mysqldump with large inserts.  For best results use statements in 
the range of 50Mb+. 
3. Note the memory usage of the replicator.

What is the expected output?

Replicator should process such statements without problems. 

What do you see instead?

Replicators run out of memory or stall during garbage collection. 

What is the possible cause?

In homogeneous statement replication on MySQL, the replicator stores statements 
as binary arrays rather than converting to Unicode strings.  This is necessary 
to handle embedded binary data and alternative character sets correctly.  
However, the replicator inefficiently converts such byte-encoded statements to 
their full string representation when parsing as well as performing various 
filter operations on the strings.  Here is an example from class JdbcApplier: 

                        // Check for table metadata cache invalidation.
                        String query = sdata.getQuery();
                        if (query == null)
                            query = new String(sdata.getQueryAsBytes());
                        SqlOperation sqlOperation = sqlMatcher.match(query);

The call to StatementData.getQuery() converts the entire binary array 
containing the statement to a Unicode string, which then persists until the 
StatementData instance is garbaged collected.  This effectively doubles the 
memory requirements for such strings in memory. 

What is the proposed solution?

There are currently 56 places in the code that use StatementData.getQuery().  
In most cases we only get the query for logging or parsing purposes.  These 
calls should be encapsulated to fetch only a portion of the query needed for 
the operation at hand.  The remaining cases should be vetted carefully to 
ensure they are using the query properly and cannot use excessive memory.  

Additional information

...

Use labels and text to provide additional information.

Original issue reported on code.google.com by robert.h...@continuent.com on 29 Jan 2014 at 3:11

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 29 Jan 2014 at 9:49

GoogleCodeExporter commented 9 years ago
Robert, I thought we decided to have this wait until 3.3.0?

Original comment by linas.vi...@continuent.com on 29 Jan 2014 at 9:50

GoogleCodeExporter commented 9 years ago
Unscheduled until we pick a release to address this problem. 

Original comment by robert.h...@continuent.com on 5 May 2014 at 11:09