epermana / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
1 stars 0 forks source link

Prefetch does not parallelize efficiently when there are very slow queries #316

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1.  Set up master/slave replication on a very large data set with 
low-cardinality indexes.   In this case the data set was about 3-4TB in size.
2. Configure prefetch with 20 parallel channels and normal settings for 
prefetch slow query cache. 
3. Start prefetch.

What is the expected output?

Prefetch should execute >90% of queries on slave databases and stay well ahead 
of the slave replicator. 

What do you see instead?

Prefetch ends up getting stuck on a small number of slow queries, which block 
the parallel queues.  As a result  the prefetch keeps falling behind and has to 
skip queries.  In this case about 80% of queries were skipped. 

What is the possible cause?

The Partitioner implementation currently uses round-robin assignment to 
parallel queues.  When there is one slow query, it blocks the entire parallel 
queue because that sub-queue fills up with randomly assigned queries.   As a 
result, we see very low parallelization much of the time. 

What is the proposed solution?

Enhance partitioner interface to provide information about the queue size so 
that Partitioner implementations can load balance based on the size of the 
queue.  This will keep queues evenly loaded and ensure parallelization is as 
efficient as possible. 

Additional information

...

Use labels and text to provide additional information.

Original issue reported on code.google.com by robert.h...@continuent.com on 16 Mar 2012 at 1:44

GoogleCodeExporter commented 9 years ago

Original comment by robert.h...@continuent.com on 19 Sep 2012 at 8:22

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 15 Jan 2013 at 4:41

GoogleCodeExporter commented 9 years ago
This is pushed out until we get a production use case. 

Original comment by robert.h...@continuent.com on 18 Mar 2013 at 6:20

GoogleCodeExporter commented 9 years ago
We'll use 2.1.0 instead of 2.0.8, hence moving the issues.

Original comment by linas.vi...@continuent.com on 27 Mar 2013 at 3:11

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 26 Aug 2013 at 1:54

GoogleCodeExporter commented 9 years ago
There won't be a 2.1.3.

Original comment by linas.vi...@continuent.com on 17 Sep 2013 at 10:13

GoogleCodeExporter commented 9 years ago
Not used currently.

Original comment by linas.vi...@continuent.com on 20 Nov 2013 at 3:40