Open avikivity opened 2 years ago
Here's a more extreme example, where read_iops=10:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 517080608 37404 1092184 0 0 1813 279424 10488 104071 2 14 83 0 0
0 0 0 517080608 37404 1092184 0 0 5 448640 7428 14379 0 0 100 0 0
0 0 0 517080608 37404 1092184 0 0 4 448640 7411 14354 0 0 100 0 0
1 0 0 517080608 37404 1092184 0 0 6 448768 7472 14463 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 7 448640 7447 14421 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 784 448896 7588 14546 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 6 448768 7414 14344 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 4 448640 7456 14430 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 6 448640 7430 14399 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 6 448768 7393 14349 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 5 448768 7467 14382 0 0 100 0 0
0 0 0 517081600 37412 1092176 0 0 5 449024 7492 14441 0 0 100 0 0
0 0 0 517081600 37412 1092176 0 0 5 448640 7425 14364 0 0 100 0 0
0 0 0 517081600 37412 1092176 0 0 5 448768 7394 14325 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 3 448768 7401 14320 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 3 448768 7475 14431 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 6 448640 7427 14364 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 8 448768 7433 14373 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 5 448768 7416 14336 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 6 448640 7479 14445 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 4 448768 7407 14360 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 4 448768 7399 14328 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 4 448640 7395 14328 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 4 448896 7468 14424 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 9 448640 7430 14388 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 5 448640 7396 14327 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 6 448768 7401 14340 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 3 448768 7460 14419 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 4 448640 7419 14358 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 7 448640 7404 14343 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 7 448768 7427 14352 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 7 448768 7489 14455 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 6 448772 7438 14375 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 3 448640 7397 14317 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 2 448768 7459 14408 0 0 100 0 0
0 0 0 517980544 37412 1108832 0 0 2147 164372 4894 7121 3 0 97 0 0
63 0 0 487100480 37412 1108120 0 0 0 375552 22368 34689 3 90 6 0 0
63 0 0 460675520 37412 1108120 0 0 568 448896 23612 14902 22 76 2 0 0
Reads start crazy and settle down, then go crazy again after 5 seconds.
[global]
runtime=30s
time_based=1
ramp_time=5
filename=/dev/nvme1n1
direct=1
group_reporting
ioengine=io_uring
size=0
random_generator=tausworthe64
randrepeat=0
thread
[job(r_idx=0,w_idx=4,write_bw=459345240,r_iops=0)]
stonewall
new_group
readwrite=write
blocksize=131072
iodepth=64
rate=459345240
[job(r_idx=0,w_idx=4,write_bw=459345240,r_iops=0).1]
readwrite=randread
blocksize=512
iodepth=1000
rate_iops=10
rate_process=poisson
I would start looking at time.c:ramp_time_over()
bool ramp_time_over(struct thread_data *td)
{
if (!td->o.ramp_time || td->ramp_time_over)
return true;
if (utime_since_now(&td->epoch) >= td->o.ramp_time) {
td->ramp_time_over = true;
reset_all_stats(td);
reset_io_stats(td);
td_set_runstate(td, TD_RAMP);
/*
* If we have a parent, the parent isn't doing IO. Hence
* the parent never enters do_io(), which will switch us
* from RAMP -> RUNNING. Do this manually here.
*/
if (parent_update_ramp(td))
td_set_runstate(td, TD_RUNNING);
return true;
}
return false;
}
In there are calls to reset_all_stats()
and reset_io_stats()
. Perhaps counters related to rate limiting are being reset and preserving them would resolve this problem.
Please acknowledge the following before creating a ticket
Description of the bug:
I am using fio to measure read latency for various combinations of read and write workloads (see diskplorer. To to that, I generate job files with rate-limited read and write jobs. Since I don't know the latency, I don't know the read concurrency in advance, and so I set it to 1000 per thread (and I run on thread per vcpu).
The problem is that this generates a large load when the job starts, as it pushes tons of requests to disk. That's fine, fio is learning the disk latency and will adjust concurrency when it does. I set ramp_time=5 to remove this learning period from measurements.
The problem is that I see the same spike after ramp_time elapsed, see vmstat output below. It starts with a spike in
bi
, settles for 5 seconds (ramp_time), then spikes and settles again. I'd really like to avoid the spike when ramp_time expires!Note also the spike in procs/r, probably workqueues from the too-high concurrency.
Sample job file: 0037.fio.
I didn't see any evidence in the code that the job is restarted on the transition from TD_RAMP to TD_RUNNING, but something is clearly happening.
Environment: Ubuntu 21.04
fio version: fio-3.25
Reproduction steps
Run the example fio above, adjust for your cpu count and disk characteristics.