SimpleSSD / SimpleSSD-FullSystem

Open-Source Licensed Educational SSD Simulator for High-Performance Storage and Full-System Evaluations
BSD 3-Clause "New" or "Revised" License
88 stars 46 forks source link

segfault when using fio benchmarks (seq. writes) #12

Closed galmusaddar closed 4 years ago

galmusaddar commented 4 years ago

Describe the bug

When running fio benchmarks with sequential writes (rw=write) and filename=/dev/nvme0n1, I get a segfault in the simulated environments after the writes are finished. I'm not able to see the results of running fio benchmark due to this segfault. This only happens with sequential writes, it doesn't happen with other type of reads and writes. I have tested in case of using filename=/dev/sda, and this segfault doesn't show up.

To Reproduce

I have used the same kernel and disk image for X86 environment that is provided by simpleSSD website I'm using the config file of intel750 where I have changed the LBASize=512 and UseCopyOnWriteDisk = 1

this is the command I have used to run SimpleSSD: ./build/X86/gem5.opt ./configs/example/fs.py --cpu-type=TimingSimpleCPU --num-cpus=1 --cpu-clock=2GHz --caches --l2cache --mem-type=DDR4_2400_8x8 --mem-size=8GB --kernel=x86_64-vmlinux-4.9.92 --disk-image=x86root.img --script=fio_write10s.sh --ssd-interface=nvme --ssd-config=./src/dev/storage/simplessd/config/intel750_400gb.cfg --root-device=/dev/nvme0n1p1

this is the command in the script to be executed inside the simulated environment ./fio --direct=1 --ioengine=libaio --iodepth=1 --bs=4096 --rw=write --filename=/dev/nvme0n1 --numjobs=1 --name=test --time_based --runtime=10s --randseed=13425

Here is the segfault I'm receiving:

Screen Shot 2020-07-15 at 8 19 44 AM

I'm not sure why this problem shows up and only with sequential writes. I would also appreciate advices to debug this issue effectively.

kukdh1 commented 4 years ago

Hi,

I assumed you booted x86root.img with DiskImageFile1 = <your path to x86root.img>.

As you specified --root-device=/dev/nvme0n1 parameter, Linux kernel uses SimpleSSD's NVMe disk as OS disk. But in fio script, you are overwriting contents of OS disk! This should lead unexpected behavior -- file system is now corrupted.

Thanks.

galmusaddar commented 4 years ago

thank you for your reply. The main purpose of the experiment is to monitor the performance of writes to NVMe disk, so I aught to define the filename as /dev/nvme0n1. How can I solve the issue? Also why this is happening with seq writes and not random writes?

Also, I wanted to comment that i ran an experiment with the root-device=/dev/sda1 (SimpleSSD command) and file-name=/dev/sda (fio) with the same type drive and the aforementioned issue did not occur. Why is this? why this happens with /dev/nvme0n1 and not /dev/sda?

Thanks...

kukdh1 commented 4 years ago

Hi,

I don't know that why are you trying to wipe out OS disk. What you are doing is, install Linux on /dev/nvme0n1p1 and writing random binary data into /dev/nvme0n1 from beginning.

If you want to measure performance of NVMe disk, boot gem5 with /dev/sda1 (default IDE disk) and perform fio test on /dev/nvme0n1. If you want to measure performance of filesystem + NVMe disk, boot gem5 with /dev/sda1 and perform fio test on --filename=<temporary file to I/O>. Mount /dev/nvme0n1p1 and store temporary file. Separating OS disk and target disk removes interference between them.

Thanks.

P.S. I don't know why fio die with segfault -- but there are some candidates:

P.S. 2 For random write, probability to overwrite critical filesystem/partition metadata become lower (as these metadata lie on very beginning of disk).

P.S. 3 If you still want to overwrite OS disk, please start overwriting from middle of disk using --offset option. If you really want to know why the segfault occurs, use addr2line or objdump to find the source line (or assembly) that generates SIGSEGV.