cxl-micron-reskit / famfs

This is the user space repo for famfs, the fabric-attached memory file system
Apache License 2.0
31 stars 9 forks source link

kernel crashed when do memory sharing stress test. [write in /dev/dax0.0 and read in /dev/dax1.0] #54

Closed guoanwu closed 3 months ago

guoanwu commented 3 months ago

[ 65.491594] ------------[ cut here ]------------ [ 65.491607] WARNING: CPU: 0 PID: 5246 at fs/dax.c:372 dax_insert_entry+0x2d5/0x2f0 [ 65.491626] Modules linked in: famfs rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_scodec_component snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm rapl iTCO_wdt intel_cstate intel_pmc_bxt pmt_telemetry snd_timer iTCO_vendor_support device_dax pmt_class intel_sdsi mei_me snd idxd i2c_i801 isst_if_mbox_pci isst_if_mmio intel_uncore pcspkr dax_hmem isst_if_common intel_vsec soundcore idxd_bus i2c_smbus i2c_ismt mei ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter joydev pfr_telemetry pfr_update xfs ax88796b crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic asix ghash_clmulni_intel sha512_ssse3 phylink qat_4xxx sha256_ssse3 intel_qat sha1_ssse3 ast usbnet mii crc8 i2c_algo_bit wmi [ 65.491666] pinctrl_emmitsburg fuse [ 65.491760] CPU: 0 PID: 5246 Comm: famfs Not tainted 6.9.0-rc5+ #4 [ 65.491769] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.86B.0109.D06.2401190347 01/19/2024 [ 65.491780] RIP: 0010:dax_insert_entry+0x2d5/0x2f0 [ 65.491790] Code: c7 45 bc 01 00 00 00 4c 8b 4d d0 e9 e1 fd ff ff 48 8b 58 20 4c 8d 43 01 e9 10 ff ff ff 48 8b 58 20 4c 8d 43 01 e9 fb fe ff ff <0f> 0b e9 1c ff ff ff 0f 0b e9 4c ff ff ff 66 66 2e 0f 1f 84 00 00 [ 65.491809] RSP: 0000:ffa000001f49bb18 EFLAGS: 00010086 [ 65.491816] RAX: ffd4000102008000 RBX: 0000000000000002 RCX: 0000000000000000 [ 65.491824] RDX: ff110060e1ad04b8 RSI: ffd4000102010000 RDI: 0000000000000000 [ 65.491832] RBP: ffa000001f49bb70 R08: 0000000000000000 R09: ffffffffffe00000 [ 65.491839] R10: ff110001ba7535f8 R11: 0000000000000001 R12: ffa000001f49bc10 [ 65.491847] R13: 0000000000000015 R14: 0000000000000000 R15: ff110001ba7535f8 [ 65.491855] FS: 00007f326e239780(0000) GS:ff11003f7c000000(0000) knlGS:0000000000000000 [ 65.491864] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.491871] CR2: 00007f326cc00008 CR3: 00000001a0d5a005 CR4: 0000000000771ef0 [ 65.491879] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 65.491887] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 65.491895] PKRU: 55555554 [ 65.491899] Call Trace: [ 65.491903] [ 65.491908] ? show_regs+0x69/0x80 [ 65.491916] ? dax_insert_entry+0x2d5/0x2f0 [ 65.491923] ? warn+0x8c/0x150 [ 65.491930] ? dax_insert_entry+0x2d5/0x2f0 [ 65.491937] ? report_bug+0x1c5/0x1d0 [ 65.491947] ? handle_bug+0x46/0x80 [ 65.491955] ? exc_invalid_op+0x19/0x70 [ 65.491961] ? asm_exc_invalid_op+0x1b/0x20 [ 65.491970] ? dax_insert_entry+0x2d5/0x2f0 [ 65.491977] ? dax_insert_entry+0x137/0x2f0 [ 65.492212] dax_fault_iter+0x28c/0x690 [ 65.492443] dax_iomap_pmd_fault+0x230/0x470 [ 65.492672] dax_iomap_fault+0x2d/0x50 [ 65.492899] famfs_filemap_fault+0x72/0x1f0 [famfs] [ 65.493124] famfs_filemap_huge_fault+0x24/0x30 [famfs] [ 65.493346] __handle_mm_fault+0x821/0xd00 [ 65.493566] handle_mm_fault+0x10c/0x350 [ 65.493781] do_user_addr_fault+0x236/0x6c0 [ 65.493994] exc_page_fault+0x78/0x180 [ 65.494200] asm_exc_page_fault+0x27/0x30 [ 65.494401] RIP: 0033:0x40906a [ 65.494595] Code: be 50 39 41 00 48 89 c7 b8 00 00 00 00 e8 2e 92 ff ff 8b 45 dc 89 c7 e8 b4 91 ff ff b8 ff ff ff ff e9 8f 01 00 00 48 8b 45 f8 <48> 8b 50 08 48 8b 45 f8 48 89 d6 48 89 c7 e8 27 c9 ff ff e9 10 01 [ 65.494988] RSP: 002b:00007fff669d3010 EFLAGS: 00010213 [ 65.495181] RAX: 00007f326cc00000 RBX: 0000000000000000 RCX: 00007f326e1072e6 [ 65.495370] RDX: 0000000000000001 RSI: 0000000000800000 RDI: 0000000000000000 [ 65.495554] RBP: 00007fff669d4070 R08: 0000000000000003 R09: 0000000000000000 [ 65.495732] R10: 0000000000000002 R11: 0000000000000246 R12: 00007fff669d42c8 [ 65.495910] R13: 0000000000405531 R14: 0000000000417de8 R15: 00007f326e2a7000 [ 65.496087] [ 65.496258] ---[ end trace 0000000000000000 ]--- [ 70.753125] famfs_get_tree: initializing new superblock for /dev/dax1.0 [ 373.817508] share_file_demo[5798]: segfault at 7fb745000000 ip 00007fb731b90995 sp 00007ffc031eb288 error 6 in libc.so.6[7fb731a28000+175000] likely on CPU 20 (core 20, socket 0) [ 373.817764] Code: f7 05 23 0c 07 00 01 00 00 00 74 a9 83 f9 c0 0f 87 46 fe ff ff 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 a4 62 c1 fe 48 7f 00 c3 66 90 4c 8b 1d 01 0c 07 00 4c 39 da 0f [ 411.038112] share_file_demo[5841]: segfault at 7f7597000000 ip 00007f7525390995 sp 00007ffc81acb568 error 6 in libc.so.6[7f7525228000+175000] likely on CPU 0 (core 0, socket 0) [ 411.038392] Code: f7 05 23 0c 07 00 01 00 00 00 74 a9 83 f9 c0 0f 87 46 fe ff ff 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 a4 62 c1 fe 48 7f 00 c3 66 90 4c 8b 1d 01 0c 07 00 4c 39 da 0f [ 927.662313] share_file_demo[6564]: segfault at 7fc20f600000 ip 00007fc208190995 sp 00007ffd7869a788 error 6 in libc.so.6[7fc208028000+175000] likely on CPU 0 (core 0, socket 0) [ 927.662616] Code: f7 05 23 0c 07 00 01 00 00 00 74 a9 83 f9 c0 0f 87 46 fe ff ff 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 a4 62 c1 fe 48 7f 00 c3 66 90 4c 8b 1d 01 0c 07 00 4c 39 da 0f

arramesh42 commented 3 months ago

Can you please provide some more context on what lead to this ? What stress test were you running ? So dax0.0 and dax1.0 are on different systems(hosts) with famfs formatted and mounted ?

Which famfs-linux branch were you running (in kernel) ? Which branch of famfs was used to mkfs the filesystem ?

guoanwu commented 3 months ago

Sure. Dax0.0 and dax1.0 are in the same system with the Astera A1000 cards with the following configure: image

famfs-linux v2 branch kernel version is 6.9-rc5. famfs with compilation fix: commit 8e11562df3a0aff0cfd844e312769a2b2c92a845 (HEAD -> master, origin/master, origin/HEAD) Author: John Groves john@groves.net Date: Wed May 1 17:23:19 2024 -0500

Update README.md for v2 patch set

We run the following test: image

jagalactic commented 3 months ago

Thank you for the report, and the details @guoanwu.

That is actually not a crash - just a warning (and a it's known issue that we should probably do a better job of warning people about).

Here is what is going on:

  1. In order to mount famfs, the user space code must check the validity of the famfs superblock and log. This is done via mmap() of the raw dax device, because raw dax does not support read/write. This is all normal and necessary.
  2. Once verified, the file system is mounted and "meta files" (.meta/superblock and .meta/log) are created for accessing the superblock and log.
  3. Then the log is played, during which the superblock and log meta files get memory mapped, and any additional files get created.
  4. The DAX driver notices that a page gets accessed via a file [presumably the superblock, since it's accessed first] that was recently accessed via mmap from something else - and calls WARN_ONCE from dax_insert_entry().

This is actually a DAX bug, but I haven't figured out the right way to fix it yet (though I have discussed it in detail with Dan Williams).

Background: most mmap() calls create a struct address_space (which is an xarray, formerly known as a radix tree) to track sparsely populated pages for that virtual address range. That sparseness is needed by page-cached files, but not by dax or famfs. When a vma fault happens, the file offset is looked up in the xarray to see if the page is resident already (which it might not be for a normal file, but it is always resident for a dax device or a famfs file). When a page is in an address_space array, the page->host field points to the xarray.

The issue is that devdax does not use the xarray for finding pages - it does linear mapping across its collection of "ranges" (usually 1 range, but it can be more than one in some cases). So it has an address_space xarray but doesn't really use it. BUT, pages accessed via mmap() from devdax do get their page->host field set to point to the xarray, but they do not actually get inserted into the xarray (which is needed for page->host cleanup on xarray teardown).

When the dax device gets closed, the xarray is torn down correctly (AFAICT), but since the pages that think they belong to that array were never actually inserted into it, they don't get their page->host field cleared. And that means that when dax_insert_entry() gets called to put the page into the xarray for the famfs [meta] file, page->host is set and we get the warning.

The net is that you can ignore this warning, which will happen once per reboot. This warning does not affect functionality. If you do see any impact in functionality, let us know - and we should investigate further. Most likely any impact on functionality is unrelated to this warning...

Thanks, John

jagalactic commented 3 months ago

Closing this issue, as it's actually a warning due to a dax bug - as documented above.