Open chamarthy opened 6 years ago
@chamarthy at first glance this sounds like a bug. Does fio immediately fail or is it something that it builds up to?
@sitsofe It depends on ulimit settings for open files. On configuration where open files is set to low, FIO fails with error "too many open files". Otherwise, on executions where there are lot of files to operate upon, there are too many open files by FIO (sometimes >20000).
OK the problem is reproducible with the following:
$ mkdir /tmp/lotsoffiles
$ cat <<EOF | fio -
[global]
thread
ioengine=libaio
size=20M
create_serialize=1
verify=md5
readwrite=rw
bs_unaligned=1
time_based=1
create_on_open=1
filesize=8k-512k
openfiles=2
bsrange=1k-8k
runtime=60
nrfiles=30
verify_state_save=0
[job1]
directory=/tmp/lotsoffiles
EOF
job1: (g=0): rw=rw, bs=(R) 1024B-8192B, (W) 1024B-8192B, (T) 1024B-8192B, ioengine=libaio, iodepth=1
fio-3.5-13-g8128
Starting 1 thread
fio: try reducing/setting openfiles (failed at 17 of 30)
fio: pid=7420, err=24/file:filesetup.c:703, func=open(/tmp/lotsoffiles/job1.0.28), error=Too many open files
The problem seems linked to the fact that get_next_verify()
can open additional files but isn't checking whether td->nr_open_files >= td->o.open_files
before doing so (and returning that we're too busy so we try again later). It's worth noting that td_io_open_file()
which is called from get_next_verify()
will increment td->nr_open_files
...
Additionally it looks like if things go wrong in td_io_open_file()
we don't decrement nr_open_files
if we goto err
.
I have observed that for workloads where large number of files are being exercised (Ex: nrfiles=10000) with limited openfiles at any point of time (openfiles=16), FIO failing with error "Too many open files". With the following job file, i would expect that FIO will keep open only 16 files at any point of time. When i observed lsof for fio, i see that after sometime the open files count increases drastically and fails.
With increased ulimit for open files: it touches to >5000 at times (And sometimes even >20000)
Job File:
Just wanted to check if there is any issue with the job profile or would it be some issue with FIO while open/close of files.
FIO Version: fio-3.3-31-gca65
on CentOS7.4.