IO500 / io500

IO500 Storage Benchmark source code
MIT License
98 stars 31 forks source link

Aborted (core dumped) io500: aiori-POSIX.c:769: POSIX_Xfer: Assertion `rc >= 0' failed. #63

Closed xin3liang closed 8 months ago

xin3liang commented 8 months ago

Main branch Aborted (core dumped) on aarch64(arm64), OS: openEuler 22.03 LTS SP3 Build and run io500 with config-minimal.ini (no change)

[jenkins@lustre-tzifycl3-01 io500]$ ./prepare.sh && make
[jenkins@lustre-tzifycl3-01 io500]$ ./io500 config-minimal.ini                                                                                                                                 
IO500 version io500-sc23_v1 (standard)                                                                                                                                                         
[RESULT]       ior-easy-write        0.113594 GiB/s : time 310.403 seconds                                                                                                                     
ERROR INVALID (src/main.c:437) Runtime of phase (101.183177) is below stonewall time. This shouldn't happen!                                                              
ERROR INVALID (src/main.c:443) Runtime is smaller than expected minimum runtime                                                                                                                
[RESULT]    mdtest-easy-write       10.052514 kIOPS : time 101.183 seconds [INVALID]                                                                                                           
[      ]            timestamp        0.000000 kIOPS : time 0.000 seconds                                                                                                                       
io500: aiori-POSIX.c:769: POSIX_Xfer: Assertion `rc >= 0' failed.                                                                                                                              
[lustre-tzifycl3-01:21666] *** Process received signal ***                                                                                                                                     
[lustre-tzifycl3-01:21666] Signal: Aborted (6)                                                                                                                                                 
[lustre-tzifycl3-01:21666] Signal code:  (-6)                                                                                                                                                  
[lustre-tzifycl3-01:21666] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb936693c]                                                                                                     
[lustre-tzifycl3-01:21666] [ 1] /usr/lib64/libc.so.6(+0x83dc0)[0xffffb902bdc0]                                                                                                                 
[lustre-tzifycl3-01:21666] [ 2] /usr/lib64/libc.so.6(raise+0x1c)[0xffffb8fe4f7c]                                                                                                               
[lustre-tzifycl3-01:21666] [ 3] /usr/lib64/libc.so.6(abort+0xe4)[0xffffb8fd2d30]                                                                                                               
[lustre-tzifycl3-01:21666] [ 4] /usr/lib64/libc.so.6(+0x368a8)[0xffffb8fde8a8]
[lustre-tzifycl3-01:21666] [ 5] /usr/lib64/libc.so.6(+0x3690c)[0xffffb8fde90c]                                                                                                                 
[lustre-tzifycl3-01:21666] [ 6] ./io500[0x441eac]                                                                                                                                              
[lustre-tzifycl3-01:21666] [ 7] ./io500[0x427f44]                                                                                                                                              
[lustre-tzifycl3-01:21666] [ 8] ./io500[0x429fe4]                                                                                                                                              
[lustre-tzifycl3-01:21666] [ 9] ./io500[0x42b1a8]                                                                                                                                              
[lustre-tzifycl3-01:21666] [10] ./io500[0x42bd74]                                                                                                                                              
[lustre-tzifycl3-01:21666] [11] ./io500[0x40ad04]                                                                                                                                              
[lustre-tzifycl3-01:21666] [12] ./io500[0x40b9e0]                                                                                                                                              
[lustre-tzifycl3-01:21666] [13] ./io500[0x405f28]                                                                                                                                              
[lustre-tzifycl3-01:21666] [14] /usr/lib64/libc.so.6(+0x2afc0)[0xffffb8fd2fc0]
[lustre-tzifycl3-01:21666] [15] /usr/lib64/libc.so.6(__libc_start_main+0x94)[0xffffb8fd3098]                                                                                                   
[lustre-tzifycl3-01:21666] [16] ./io500[0x403d30]                                                                                                                                              
[lustre-tzifycl3-01:21666] *** End of error message ***                                                                                                                                        
Aborted (core dumped)                                 
gflofst commented 8 months ago

That source file is not in the repo. Is this from a current branch.

xin3liang commented 8 months ago

That source file is not in the repo. Is this from a current branch.

It is the main branch. Also try tag io500-sc23_v1, the same issue.

JulianKunkel commented 8 months ago

Hi, that error happened in ior-hard-write, can you check the file ior-hard-write.txt in the results directory? There will be an error message printed.

xin3liang commented 8 months ago

Hi, that error happened in ior-hard-write, can you check the file ior-hard-write.txt in the results directory? There will be an error message printed.

Hi @JulianKunkel , thanks for your advice. It does really help me a lot :-). I reproduce another core dumped. Figure out it is stop at mdtest-hard-write through file result.txt. And found that there is an error "No space left on device" in mdtest-hard-write.txt :

[jenkins@lustre-tzifycl3-01 2024.01.31-01.59.52]$ tail -n 4 result.txt 

[mdtest-hard-write]
t_start         = 2024-01-31 02:12:12
exe             = ./mdtest --dataPacketType=timestamp -n 1000000 -t -w 3901 -e 3901 -P -G=508879435 -N 1 -F -d ./datafiles/2024.01.31-01.59.52/mdtest-hard -x ./results/2024.01.31-01.59.52/mdtest-hard.stonewall -C -Y -W 300 --saveRankPerformanceDetails=./results/2024.01.31-01.59.52/mdtest-hard-write.csv -a POSIX

[jenkins@lustre-tzifycl3-01 2024.01.31-01.59.52]$ tail mdtest-hard-write.txt -n 5
V-0: Rank   0 Line  2537 Shifting ranks by 1 for each phase.
1 tasks, 1000000 files
WARNING: write(20, 0x25385000, 3901) failed No space left on device
WARNING: task 0, partial write(), -1 of 3901 bytes at offset 0

After using a 100G disk and re-run io500, it can finish running now, thanks again.

[jenkins@lustre-tzifycl3-01 io500]$ ./io500 config-minimal.ini
IO500 version io500-sc23_v1 (standard)
[RESULT]       ior-easy-write        0.109713 GiB/s : time 310.753 seconds
ERROR INVALID (src/main.c:437) Runtime of phase (38.869370) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:443) Runtime is smaller than expected minimum runtime
[RESULT]    mdtest-easy-write       26.472721 kIOPS : time 38.869 seconds [INVALID]
[      ]            timestamp        0.000000 kIOPS : time 0.000 seconds
[RESULT]       ior-hard-write        0.109988 GiB/s : time 310.931 seconds
ERROR INVALID (src/main.c:437) Runtime of phase (85.696574) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:443) Runtime is smaller than expected minimum runtime
[RESULT]    mdtest-hard-write       11.814771 kIOPS : time 85.697 seconds [INVALID]
[RESULT]                 find      657.568252 kIOPS : time 3.046 seconds
[RESULT]        ior-easy-read        0.059481 GiB/s : time 573.174 seconds
[RESULT]     mdtest-easy-stat       75.835681 kIOPS : time 14.255 seconds
[RESULT]        ior-hard-read        0.056239 GiB/s : time 608.066 seconds
[RESULT]     mdtest-hard-stat       69.734729 kIOPS : time 15.415 seconds
[RESULT]   mdtest-easy-delete       36.384563 kIOPS : time 28.516 seconds
[RESULT]     mdtest-hard-read        0.795102 kIOPS : time 1258.745 seconds
[RESULT]   mdtest-hard-delete       26.937286 kIOPS : time 38.406 seconds
[SCORE ] Bandwidth 0.079709 GiB/s : IOPS 30.975811 kiops : TOTAL 1.571319 [INVALID]

The result files are stored in the directory: ./results/2024.01.30-05.00.52
[jenkins@lustre-tzifycl3-01 io500]$ git diff -- config-minimal.ini
diff --git a/config-minimal.ini b/config-minimal.ini
index 8ce4e57..506a098 100644
--- a/config-minimal.ini
+++ b/config-minimal.ini
@@ -1,2 +1,2 @@
 [global]
-datadir = ./datafiles
+datadir = /home/jenkins/tmp-mount/io500/datafiles
[jenkins@lustre-tzifycl3-01 io500]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           1.5G  8.8M  1.5G   1% /run
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
/dev/vda3        75G  3.3G   69G   5% /
tmpfs           3.8G     0  3.8G   0% /tmp
/dev/vda1       549M  6.5M  543M   2% /boot/efi
/dev/vdb         98G   32K   93G   1% /home/jenkins/tmp-mount
[jenkins@lustre-tzifycl3-01 io500]$