Percona-Lab / tpcc-mysql

478 stars 186 forks source link

Response time measurement is broken #16

Open keithmork opened 6 years ago

keithmork commented 6 years ago

The response time seems to be milliseconds instead of seconds.

We can see the reported time is WAY bigger than the report interval:

  [warehouse]: 1
 [connection]: 1
     [rampup]: 10 (sec.)
    [measure]: 60 (sec.)

RAMP-UP TIME.(10 sec.)

MEASURING START.

   1, trx: 38, 95%: 19.054, 99%: 20.400, max_rt: 23.462, 37|49.750, 4|2.012, 4|47.448, 5|48.571
   2, trx: 40, 95%: 19.180, 99%: 21.549, max_rt: 21.550, 42|9.995, 3|1.507, 4|41.380, 3|43.022
   3, trx: 38, 95%: 18.850, 99%: 20.073, max_rt: 20.075, 38|6.083, 4|1.796, 3|44.210, 5|46.763
   4, trx: 39, 95%: 19.575, 99%: 20.031, max_rt: 20.029, 38|5.523, 4|1.620, 4|40.641, 4|42.997
   5, trx: 41, 95%: 18.066, 99%: 20.480, max_rt: 20.479, 42|5.633, 5|1.637, 4|40.173, 3|44.047
   6, trx: 36, 95%: 19.336, 99%: 20.073, max_rt: 20.074, 37|7.212, 3|1.645, 4|43.032, 4|45.687
   7, trx: 40, 95%: 18.158, 99%: 18.941, max_rt: 18.942, 40|5.775, 5|1.682, 4|40.240, 4|47.486
   8, trx: 40, 95%: 18.760, 99%: 19.267, max_rt: 19.264, 38|4.418, 3|2.970, 4|41.314, 3|42.710
   9, trx: 38, 95%: 19.400, 99%: 20.720, max_rt: 20.723, 39|6.707, 4|1.742, 4|42.198, 5|44.275
  10, trx: 39, 95%: 19.353, 99%: 20.031, max_rt: 20.032, 36|5.718, 4|1.590, 3|40.970, 4|44.543
  11, trx: 38, 95%: 18.104, 99%: 19.020, max_rt: 19.020, 44|5.699, 4|1.529, 4|41.297, 4|46.835
  12, trx: 40, 95%: 19.077, 99%: 19.799, max_rt: 19.796, 33|11.647, 3|2.102, 5|49.200, 3|46.795
  13, trx: 39, 95%: 18.022, 99%: 19.295, max_rt: 19.298, 43|6.350, 4|1.416, 3|40.375, 5|45.496
  14, trx: 40, 95%: 19.516, 99%: 20.683, max_rt: 20.681, 43|5.311, 4|1.558, 4|39.124, 3|43.984
  15, trx: 39, 95%: 19.180, 99%: 20.596, max_rt: 20.595, 34|6.360, 4|1.617, 5|43.031, 4|44.889
  16, trx: 41, 95%: 18.833, 99%: 19.330, max_rt: 19.329, 42|5.984, 5|1.550, 3|41.493, 4|45.155
  17, trx: 37, 95%: 19.180, 99%: 21.178, max_rt: 21.181, 37|5.062, 3|1.474, 4|43.459, 4|43.914
  18, trx: 41, 95%: 18.833, 99%: 19.094, max_rt: 19.096, 44|7.158, 4|1.604, 4|37.431, 4|45.661
  19, trx: 39, 95%: 18.997, 99%: 20.067, max_rt: 20.069, 36|7.481, 4|1.569, 4|40.849, 4|43.871
  20, trx: 41, 95%: 18.941, 99%: 19.900, max_rt: 19.898, 39|4.501, 4|1.582, 4|45.777, 4|43.360
...

And even bigger than the total run time:

  [warehouse]: 1
 [connection]: 10
     [rampup]: 10 (sec.)
    [measure]: 60 (sec.)

RAMP-UP TIME.(10 sec.)

MEASURING START.

   1, trx: 85, 95%: 53.087, 99%: 69.646, max_rt: 248.682, 84|281.659, 8|9.360, 8|150.075, 8|166.969
   2, trx: 107, 95%: 59.732, 99%: 71.377, max_rt: 90.132, 105|96.892, 10|6.114, 11|111.111, 10|178.489
   3, trx: 103, 95%: 54.455, 99%: 70.930, max_rt: 84.639, 100|102.048, 11|4.509, 11|112.294, 10|133.507
   4, trx: 101, 95%: 44.042, 99%: 49.599, max_rt: 51.692, 103|109.179, 10|8.071, 10|94.886, 11|304.122
   5, trx: 107, 95%: 52.707, 99%: 57.041, max_rt: 57.193, 105|82.368, 11|12.590, 10|103.049, 11|123.099
   6, trx: 111, 95%: 55.725, 99%: 71.356, max_rt: 80.031, 111|81.273, 11|14.588, 11|129.892, 11|153.645
   7, trx: 107, 95%: 56.803, 99%: 65.993, max_rt: 77.392, 106|80.115, 10|2.139, 11|136.094, 10|135.719
   8, trx: 101, 95%: 49.422, 99%: 58.162, max_rt: 65.192, 102|89.044, 11|4.108, 10|138.442, 11|133.251
   9, trx: 106, 95%: 57.728, 99%: 62.850, max_rt: 75.334, 106|80.936, 10|4.417, 10|114.998, 10|136.289
  10, trx: 107, 95%: 53.598, 99%: 63.304, max_rt: 70.591, 112|85.523, 11|6.529, 11|97.303, 11|144.880
  11, trx: 110, 95%: 63.684, 99%: 77.293, max_rt: 77.998, 103|95.891, 11|3.538, 11|153.217, 11|148.815
  12, trx: 108, 95%: 53.326, 99%: 77.177, max_rt: 78.426, 107|108.577, 10|3.825, 11|120.383, 10|150.871
  13, trx: 103, 95%: 44.921, 99%: 52.738, max_rt: 65.801, 103|88.636, 11|4.871, 9|107.686, 11|88.867
  14, trx: 102, 95%: 52.345, 99%: 61.455, max_rt: 67.663, 103|82.450, 10|10.045, 11|115.128, 11|156.847
  15, trx: 98, 95%: 50.589, 99%: 60.163, max_rt: 62.145, 96|129.893, 10|3.197, 10|101.315, 9|116.530
  16, trx: 97, 95%: 49.466, 99%: 62.869, max_rt: 63.618, 103|88.760, 10|5.662, 10|93.754, 9|117.029
  17, trx: 98, 95%: 44.306, 99%: 49.972, max_rt: 52.833, 100|129.102, 10|3.362, 10|95.633, 10|147.868
  18, trx: 106, 95%: 50.726, 99%: 59.947, max_rt: 60.412, 101|88.115, 10|6.946, 10|99.145, 11|125.987
  19, trx: 108, 95%: 54.227, 99%: 57.195, max_rt: 64.283, 108|103.297, 10|6.999, 11|88.397, 11|166.613
  20, trx: 105, 95%: 53.662, 99%: 85.063, max_rt: 85.103, 105|88.827, 11|5.467, 10|109.460, 10|111.302
...

And my db has long_query_time = 0.1 yet there's nothing in the slow log.


If it's indeed milliseconds, then the final reports and checks are incorrect.

<Raw Results>
  [0] sc:0 lt:6271  rt:0  fl:0 avg_rt: 34.0 (5)
  [1] sc:1 lt:6267  rt:0  fl:0 avg_rt: 55.6 (5)
  [2] sc:597 lt:29  rt:0  fl:0 avg_rt: 2.8 (5)
  [3] sc:257 lt:370  rt:0  fl:0 avg_rt: 101.6 (80)
  [4] sc:0 lt:626  rt:0  fl:0 avg_rt: 116.4 (20)
 in 60 sec.

...
 [response time (at least 90% passed)]
      New-Order: 0.00%  [NG] *
        Payment: 0.02%  [NG] *
   Order-Status: 95.37%  [OK]
       Delivery: 40.99%  [NG] *
    Stock-Level: 0.00%  [NG] *

Here seems to be the problem?

rt = (double)(tbuf2.tv_sec * 1000.0 + tbuf2.tv_nsec/1000000.0-tbuf1.tv_sec * 1000.0 - tbuf1.tv_nsec/1000000.0);

https://github.com/Percona-Lab/tpcc-mysql/blob/master/src/driver.c#L141

total_rt[0] += rt;

https://github.com/Percona-Lab/tpcc-mysql/blob/master/src/driver.c#L149

    printf("  [%d] sc:%d lt:%d  rt:%d  fl:%d avg_rt: %.1f (%d)\n",
           i, success[i], late[i], retry[i], failure[i],
           total_rt[i] / (success[i] + late[i]), rt_limit[i]);

https://github.com/Percona-Lab/tpcc-mysql/blob/master/src/main.c#L534


Similar issues:

https://github.com/Percona-Lab/tpcc-mysql/issues/6, 2016-10
tpcc-mysql 基准压测问题, 2017-09

vanturman commented 5 years ago

I agree with you. it seems the response time shown by mysql-tpcc is uncorrect. There is no problems with previous versions.

jinqiangwang commented 5 years ago

It has nothing to do with respone time calculation, value used here are indeed in milliseconds. At least in this version, the code has hardcoded slow transaction threshold values listed below, in milliseconds -

// main.c
#define RTIME_NEWORD   5    // new order
#define RTIME_PAYMENT  5    // payment
#define RTIME_ORDSTAT  5    // order status
#define RTIME_DELIVERY 80   // delivery
#define RTIME_SLEV     20   // stock level

You'll need to tune your DB config / storage type / connection count to see if you can reach a all "good" status.

The easier way might be using the tpcc command line options to set these threshold values to whever you want, as an example -

./tpcc_start -h127.0.0.1 -P3306 -dtpcc1000 -w10 -c1 -r10 -l120 -0 10 -1 10 -2 10 -3 70 -4 30

The numbers of last 5 options are corresponding to the order number of above macro defines, starting from 0.

oleotiger commented 4 years ago

How to fix this bug?

oleotiger commented 4 years ago

What about add divided by 1000?

rt = (double)()(tbuf2.tv_sec * 1000.0 + tbuf2.tv_nsec/1000000.0-tbuf1.tv_sec * 1000.0 - tbuf1.tv_nsec/1000000.0)/1000.0);