cxl-micron-reskit / famfs

This is the user space repo for famfs, the fabric-attached memory file system
Apache License 2.0
31 stars 9 forks source link

when I use the famfs to create a famfs file system on the cxl device, which has 256GB memory capacity, it comes a "bus error" sudo famfs/debug/mkfs.famfs /dev/dax0.0 #62

Closed lordiscat closed 2 months ago

lordiscat commented 3 months ago

What is the maximum capacity of memory that famfs can support to generate a famfs file system?

jagalactic commented 3 months ago

I'm not aware of a size limit - the intent is to support very large memory - but I'm not sure we've tested that large because most of our systems/devices are smaller. I will run a test this week to see if I see this issue.

Meanwhile, do make sure your dax device is in "devdax" mode.

Thanks

jagalactic commented 3 months ago

I ran famfs/run_stress_tests.sh on a 256GiB Dax device and a 512GiB Dax device successfully. That creates a file system, fills it almost full, and runs fio against the files - exercising all of the memory. Both worked, so I don't see a famfs bug with larger devices. This was using regular dram as a Dax device to host famfs.

BTW regarding #60 this test was annoying because it took a while to create all the test files.

Anyway, if yours is still failing please provide the full sequence of commands you ran, plus any output from the kernel log.

fio --name=8-290-MB-files-per-thread  --nrfiles=8 --bs=2M --group_reporting=1 
    --alloc-size=1048576 --filesize=304087040 --readwrite=write --fallocate=none 
    --numjobs=111 --create_on_open=0 --directory=/mnt/famfs/test_16981 --time_based --runtime=60

8-290-MB-files-per-thread: (g=0): rw=write, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=psync, iodepth=1
...
fio-3.33
Starting 111 processes

8-290-MB-files-per-thread: (groupid=0, jobs=111): err= 0: pid=19836: Tue Jun  4 14:57:19 2024
  write: IOPS=65.9k, BW=129GiB/s (138GB/s)(7728GiB/60002msec); 0 zone resets
    clat (usec): min=96, max=5926, avg=1421.28, stdev=376.17
     lat (usec): min=112, max=50733, avg=1675.68, stdev=483.57
    clat percentiles (usec):
     |  1.00th=[  371],  5.00th=[  807], 10.00th=[  914], 20.00th=[ 1156],
     | 30.00th=[ 1369], 40.00th=[ 1418], 50.00th=[ 1450], 60.00th=[ 1483],
     | 70.00th=[ 1500], 80.00th=[ 1598], 90.00th=[ 1975], 95.00th=[ 2057],
     | 99.00th=[ 2147], 99.50th=[ 2180], 99.90th=[ 2311], 99.95th=[ 2573],
     | 99.99th=[ 2671]
   bw (  MiB/s): min=109934, max=162870, per=100.00%, avg=131991.59, stdev=289.33, samples=13209
   iops        : min=54942, max=81394, avg=65960.14, stdev=144.66, samples=13209
  lat (usec)   : 100=0.01%, 250=0.01%, 500=3.34%, 750=0.60%, 1000=10.94%
  lat (msec)   : 2=76.39%, 4=8.72%, 10=0.01%
  cpu          : usr=15.86%, sys=83.45%, ctx=9016, majf=0, minf=34088
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3956517,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=129GiB/s (138GB/s), 129GiB/s-129GiB/s (138GB/s-138GB/s), io=7728GiB (8297GB), run=60002-60002msec

Full device size      : 270580842496
Usable device size    : 270547288064
User give size        : 270547288064
Number of jobs        : 111
Files per job         : 8
Total files           : 888
File size (Bytes)     : 304670369
2 MiB aligned fsize   : 304087040
File size (MiB)       : 290 MB
Runtime               : 60
Size of all files     : 270029291520
Files location        : /mnt/famfs/test_16981

Thanks, John

jagalactic commented 3 months ago

Suggestion: on the Dax device where mkfs.famfs fails, test the memory (if you haven't already). You can use Stream or Multichase - under the same GitHub organization as this project there are versions of both benchmarks that can test a Dax device directly.

https://github.com/cxl-micron-reskit/STREAM

https://github.com/cxl-micron-reskit/multichase

jagalactic commented 2 months ago

We have been unable to reproduce this issue - and we have tested with 256GiB dax devices.

If you still experience this problem, please re-open and provide detailed information about how to reproduce the issue.