Open lic34 opened 5 years ago
@lic34 we're going to need to see a backtrace that includes all the threads (thread apply all bt
) from the point of the crash to even start looking at this one...
@lic34, are you by chance running this as root
or with sudo
, or as a normal user? To make this a bit more clear, I noticed that at least in my tests when I don't run as root, this does not succeed:
https://github.com/axboe/fio/blob/de5ed0e4d398bc9d4576f9b2b82d7686989c27e1/os/os-solaris.h#L151
A simple test case:
# No root here
$ ./pset-create-test
result: -1
pset_create: Not owner
# With sudo
$ sudo ./pset-create-test
result: 0
pset_create: Error 0
My very simple test driver for this:
#include <stdio.h>
#include <sys/pset.h>
#include <errno.h>
#include <string.h>
int main(void) {
psetid_t newpset = 0;
int res = pset_create(&newpset);
printf("result: %d\n", res);
perror("pset_create");
}
I am wondering if we are failing to do some things because we do not have a particular level of access.
I am not necessarily suggesting that it is all a perms thing, but wanted to see if you have been doing this with elevated permissions, and if not, if it were possible as a test.
@lic34, are you experiencing this failure with solarisaio very consistently or is it intermittent? I think I am reproducing this problem, but not consistently. I have to re-run the test several times before I trigger it, but do suspect problem is same as what you are having. I am fairly sure a lot of this aio code on illumos and solaris is nearly same, likely having been stable for a long time, so I am guessing something about more CPUs, etc., could be why it is more consistent for you, if it is indeed consistent.
@szaydel, Thanks for your support! If there is anything I can help, please feel free let me know.
@lic34 just clarify @szaydel was asking you the following (I've reworded things based on my interpretation):
solarisaio
intermittent or constant?@sitsofe, thanks - that's exactly what I meant. :)
Sorry for my late replay. It seems the core dump was not hit each time, but the issue of "job number decreased to 1 in a short time after FIO starts" was a constant issue.
Thanks @lic34. I did not observe this decrease, but at least I am reproducing the periodic crashes. I am going to see if I can do something about it when I find some free time.
@lic34 is this one still happening with the latest fio releases? If so do you think you could post a backtrace of the crashes? Thanks!
I tried to run parallel IO to 5 LUNs, the FIO profile likes below:
_[global] ioengine=solarisaio thread iodepth=16 direct=0 bs_unaligned=0 time_based=1 rwmixwrite=50 rwmixread=50 do_verify=1 bsrange=4k-4k refill_buffers=0 runtime=2808 fill_device=1 numjobs=1 readwrite=randrw [public_lg_src_remote_20] filename=/dev/dsk/emcpower1c size=86% [public_lg_src_remote_21] filename=/dev/dsk/emcpower2c size=86% [public_lg_src_remote_22] filename=/dev/dsk/emcpower3c size=86% [public_lg_src_remote_23] filename=/dev/dsk/emcpower4c size=86% [public_lg_src_remote24] filename=/dev/dsk/emcpower0c size=86%
It starts with 5 Jobs, but one minute later, there were only 4 Jobs, and the Job number decreased to 1 finally, even worst, the FIO end with core dumped:
root@ncvm9084105:/opt/csw/bin/fio-log# fio --output ./test.log ./test.fio clock setaffinity failed: Invalid argument Jobs: 4 (f=4): [m(2),X(1),m(2)][1.5%][r=868KiB/s,w=820KiB/s][r=217,w=205 IOPS][eta 46m:36s]
root@ncvm9084105:/opt/csw/bin/fio-log# fio --output ./test.log ./test.fio clock setaffinity failed: Invalid argument Segmentation Fault (core dumped)(1)][1.5%][r=1670KiB/s,w=1734KiB/s][r=417,w=433 IOPS][eta 47m:17s]
However, when I remove the parameter "thread" from the FIO profile, it works normally:
_root@ncvm9084105:/opt/csw/bin/fio-log# cat test.fio [global] ioengine=solarisaio iodepth=16 direct=0 bs_unaligned=0 time_based=1 rwmixwrite=50 rwmixread=50 do_verify=1 bsrange=4k-4k refill_buffers=0 runtime=2808 fill_device=1 numjobs=1 readwrite=randrw [public_lg_src_remote_20] filename=/dev/dsk/emcpower1c size=86% [public_lg_src_remote_21] filename=/dev/dsk/emcpower2c size=86% [public_lg_src_remote_22] filename=/dev/dsk/emcpower3c size=86% [public_lg_src_remote_23] filename=/dev/dsk/emcpower4c size=86% [public_lg_src_remote24] filename=/dev/dsk/emcpower0c size=86%
root@ncvm9084105:/opt/csw/bin/fio-log# fio --output ./test.log ./test.fio clock setaffinity failed: Invalid argument Jobs: 5 (f=5): [m(5)][14.8%][r=2361KiB/s,w=2361KiB/s][r=590,w=590 IOPS][eta 39m:56s]