Phala-Network / phala-blockchain

The Phala Network Blockchain, pRuntime and the bridge.
https://phala.network
Apache License 2.0
333 stars 148 forks source link

pruntime enclave crashes under Linux 5.15.0 #853

Closed ngrewe closed 2 years ago

ngrewe commented 2 years ago

Are there any known incompatibilities between the pruntime and the Linux kernel 5.15.0? We are currently seeing occasional failures under Ubuntu 22.04 using this kernel version and the in-tree SGX driver.

$ uname -a
Linux ubuntu2204-4 5.15.0-39-generic #42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ ls -r /dev/*sgx*
/dev/sgx_vepc  /dev/sgx_provision  /dev/sgx_enclave

The issue presents itself as follows and seems to happen when dispatching blocks at varying heights, but always after "Taking checkpoint…":

[2022-06-26T11:18:47.798953Z INFO  rocket::server] POST /bin_api/dispatch_block application/octet-stream:

[2022-06-26T11:18:47.798977Z INFO  rocket::server] Matched: (dispatch_block) POST /bin_api/dispatch_block

[2022-06-26T11:18:47.803682Z INFO  phactory::prpc_service] dispatch_block from=Some(877811) to=Some(877811)

[2022-06-26T11:18:47.803730Z INFO  phactory::prpc_service] Dispatching block: 877811

[2022-06-26T11:18:47.806955Z INFO  phactory::prpc_service] State synced

[2022-06-26T11:18:47.807168Z INFO  phactory] Taking checkpoint...

[2022-06-26T11:18:48.138833Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!

[2022-06-26T11:18:48.138863Z INFO  rocket::server] Outcome: Success

[2022-06-26T11:18:48.138888Z INFO  rocket::server] Response succeeded.

thread 'thread 'thread 'thread 'thread 'thread 'thread 'thread 'thread 'thread 'thread 'thread 'bench-6bench-0bench-13bench-3bench-15bench-5bench-4' panicked at '' panicked at 'bench-8' panicked at '' panicked at 'bench-14' panicked at 'Run benchmark 5 failedRun benchmark 4 failedRun benchmark 8 failed', bench-10bench-7' panicked at '' panicked at 'Run benchmark 10 failed' panicked at '', Run benchmark 3 failed' panicked at 'src/main.rsRun benchmark 6 failedRun benchmark 13 failed' panicked at '', ', ', src/main.rs' panicked at 'src/main.rsRun benchmark 0 failed:', 708', Run benchmark 15 failedsrc/main.rs', ::708', 708Run benchmark 14 failedsrc/main.rs', src/main.rssrc/main.rssrc/main.rs::708Run benchmark 7 failed708src/main.rs::src/main.rs25:

25::25:708::708:25', :25:

708

:708::2525

src/main.rsstack backtrace:

:708708:

2525

:25

25

thread 'bench-1' panicked at 'Run benchmark 1 failed', src/main.rs:708:25

thread 'bench-11' panicked at 'Run benchmark 11 failed', src/main.rs:708:25

thread 'thread 'bench-9bench-2' panicked at '' panicked at 'Run benchmark 9 failedRun benchmark 2 failed', ', src/main.rssrc/main.rs::708708:bench-1225' panicked at '

Run benchmark 12 failed:', 25

src/main.rs:708:25

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

stack backtrace:

   0: rust_begin_unwind

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/std/src/panicking.rs:498:5

   1: core::panicking::panic_fmt

             at /rustc/82af160c2cb9c349a0373cba98d8ad7f911f0d34/library/core/src/panicking.rs:106:14

note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

[2022-06-26T11:18:48.543566Z INFO  rocket::server] GET /get_info:

[2022-06-26T11:18:48.543626Z INFO  rocket::server] Matched: (get_info) GET /get_info

[2022-06-26T11:18:48.547054Z ERROR app] [-] ECALL Enclave Failed SGX_ERROR_ENCLAVE_CRASHED!

[2022-06-26T11:18:48.547103Z INFO  rocket::server] Outcome: Success

[2022-06-26T11:18:48.547161Z INFO  rocket::server] Response succeeded.

The same pool is also serviced by machines running 20.04 (kernel 5.4.0 with Intel's out-of-tree DCAP driver), which do not experience the this problem.

jasl commented 2 years ago

AESM (the userspace service of SGX) hasn't support 22.04 yet, Intel targeting Q3 to support it. Can you retry on Ubuntu 20.04 with HWE?

ngrewe commented 2 years ago

But AESM is running inside the pruntime container, isn't it? I wouldn't have expected any interference from the host OS there (barring any changes in kernel interfaces, of course). Anyways, I've upgraded one of the 20.04 machines to Linux 5.13.0 from the HWE, so it's not exactly identical, but at least it has the in-tree driver. I'll observe the system for a bit and report back whether there is any breakage.

ngrewe commented 2 years ago

Just for completeness: There are reports of a bug prior to 5.19 where enclaves could crash when under memory pressure, but for all intents and purposes the affected system is not starved for memory at all.

jasl commented 2 years ago

But AESM is running inside the pruntime container, isn't it? I wouldn't have expected any interference from the host OS there (barring any changes in kernel interfaces, of course). Anyways, I've upgraded one of the 20.04 machines to Linux 5.13.0 from the HWE, so it's not exactly identical, but at least it has the in-tree driver. I'll observe the system for a bit and report back whether there is any breakage.

PRuntime uses AESM running in container, however, I was doing some experiments and found if AESM not running at host OS, the SGX behaivor will be different (e.g. if AESM not start you won't see /dev/sgx/enclave & /dev/sgx/provision devices) which will affect SGX functionality inside the container. I haven't dig deeper yet.

I heard a guy trying to reuse host AESM (by pass aesmd.sock), but I haven't follow up, I'd try when I have time.

jasl commented 2 years ago

Just for completeness: There are reports of a bug prior to 5.19 where enclaves could crash when under memory pressure, but for all intents and purposes the affected system is not starved for memory at all.

We found PRuntime will be unstable when EPC mem size too small, not sure it relates to the report

kvinwang commented 2 years ago

Just for completeness: There are reports of a bug prior to 5.19 where enclaves could crash when under memory pressure, but for all intents and purposes the affected system is not starved for memory at all.

Taking a checkpoint will allocate much memory and make the enclave run out of EPC memory. @jasl Can we upgrade some machines to kernel 5.19 and test on it?

jasl commented 2 years ago

Just for completeness: There are reports of a bug prior to 5.19 where enclaves could crash when under memory pressure, but for all intents and purposes the affected system is not starved for memory at all.

Taking a checkpoint will allocate much memory and make the enclave run out of EPC memory. @jasl Can we upgrade some machines to kernel 5.19 and test on it?

yeah, we can try 5.19

jasl commented 2 years ago

@ngrewe how many EPC mem size in your machine?

ngrewe commented 2 years ago

@ngrewe how many EPC mem size in your machine?

The PRMRR size is at 256M.

jasl commented 2 years ago

Just got replied from Intel, AESM hasn't do adaption for new knernel yet

ngrewe commented 2 years ago

Just got replied from Intel, AESM hasn't do adaption for new knernel yet Thank's for clearing that up! Do you, perchance, know whether they have documentation on the upper bound of supported kernel versions somewhere?

jasl commented 2 years ago

Just got replied from Intel, AESM hasn't do adaption for new knernel yet Thank's for clearing that up! Do you, perchance, know whether they have documentation on the upper bound of supported kernel versions somewhere?

the support engineer doesn't say clear, but he mentioned Intel only ensure AESM compatible with distros' knernels which listed in https://download.01.org/intel-sgx/sgx-linux/2.17/distro/

I was asking for 22.04 support, he said targeting Q3

jasl commented 2 years ago

Ubuntu 20.04 HWE kernel sudden upgraded to 5.15 last week, Pruntime can start but will crash when running, dmesg shows

[22508.976764] ------------[ cut here ]------------
[22508.976771] ELDU returned 1073741837 (0x4000000d)
[22508.976778] WARNING: CPU: 0 PID: 2290 at arch/x86/kernel/cpu/sgx/encl.c:81 __sgx_encl_eldu+0x3ac/0x430
[22508.976789] Modules linked in: tls veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay binfmt_misc snd_sof_pci_intel_apl nls_iso8859_1 snd_sof_intel_hda_common intel_rapl_msr soundwire_intel soundwire_generic_allocation soundwire_cadence intel_rapl_common mei_hdcp snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp intel_pmc_bxt snd_sof intel_telemetry_pltdrv soundwire_bus intel_punit_ipc intel_telemetry_core snd_soc_skl x86_pkg_temp_thermal snd_soc_hdac_hda snd_hda_ext_core intel_powerclamp snd_soc_sst_ipc snd_soc_sst_dsp coretemp snd_hda_codec_hdmi snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_hda_codec_realtek joydev snd_hda_codec_generic snd_compress kvm_intel ac97_bus ledtrig_audio input_leds snd_pcm_dmaengine snd_hda_intel kvm snd_intel_dspcfg btusb snd_intel_sdw_acpi btrtl btbcm btintel snd_hda_codec
[22508.976868]  rapl intel_cstate bluetooth snd_hda_core ecdh_generic ecc snd_hwdep serio_raw snd_pcm snd_timer snd mei_me soundcore mei mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler msr ramoops reed_solomon pstore_blk pstore_zone mtd efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_lenovo hid_generic usbhid hid i915 i2c_algo_bit crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ttm drm_kms_helper syscopyarea sysfillrect aesni_intel sysimgblt fb_sys_fops cec ahci rc_core libahci sdhci_pci crypto_simd cryptd psmouse r8169 drm cqhci i2c_i801 xhci_pci sdhci i2c_smbus xhci_pci_renesas realtek video pinctrl_geminilake
[22508.976958] CPU: 0 PID: 2290 Comm: rocket-worker-t Not tainted 5.15.0-41-generic #44-Ubuntu
[22508.976962] Hardware name: Default string Default string/TJ41G-A80, BIOS 5.13 2020/02/12
[22508.976965] RIP: 0010:__sgx_encl_eldu+0x3ac/0x430
[22508.976969] Code: ff ff e8 a7 a3 23 00 e9 f3 fe ff ff 89 c1 89 c2 48 c7 c6 a3 d4 db b5 48 c7 c7 a8 d4 db b5 c6 05 13 d1 17 02 01 e8 cb 72 c3 00 <0f> 0b e9 4b fe ff ff e8 68 cb cd 00 48 89 de 48 c7 c7 60 33 65 b6
[22508.976972] RSP: 0000:ffffb3e540cdbbc0 EFLAGS: 00010282
[22508.976975] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[22508.976977] RDX: ffff91bcb8420588 RSI: 0000000000000001 RDI: ffff91bcb8420580
[22508.976979] RBP: ffffb3e540cdbca0 R08: 0000000000000003 R09: fffffffffffd0de8
[22508.976980] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff91bb4ff27840
[22508.976982] R13: 00000000000020ec R14: ffff91bb4fba0000 R15: ffff91bbfaf30000
[22508.976984] FS:  00007f94c9244700(0000) GS:ffff91bcb8400000(0000) knlGS:0000000000000000
[22508.976987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22508.976989] CR2: 00007f94020ec000 CR3: 000000010fa08000 CR4: 0000000000350ef0
[22508.976991] Call Trace:
[22508.976994]  <TASK>
[22508.976999]  ? __wake_up+0x13/0x20
[22508.977004]  sgx_encl_eldu+0x3f/0xd0
[22508.977008]  sgx_encl_load_page+0x7c/0xc0
[22508.977011]  sgx_vma_fault+0x44/0x100
[22508.977015]  __do_fault+0x39/0x120
[22508.977020]  do_fault+0x1ab/0x2e0
[22508.977023]  handle_pte_fault+0x1c5/0x230
[22508.977026]  __handle_mm_fault+0x3c7/0x700
[22508.977030]  handle_mm_fault+0xd8/0x2c0
[22508.977033]  do_user_addr_fault+0x1c5/0x670
[22508.977038]  exc_page_fault+0x77/0x160
[22508.977043]  ? asm_exc_page_fault+0x8/0x30
[22508.977047]  asm_exc_page_fault+0x1e/0x30
[22508.977050] RIP: 0033:0x7ffd6634ecea
[22508.977055] Code: 43 48 8b 4d 10 48 c7 c3 28 00 00 00 48 83 3c 19 00 75 31 48 83 c3 08 48 81 fb 00 01 00 00 75 ec 48 8b 19 48 8d 0d 00 00 00 00 <0f> 01 d7 48 8b 5d 10 c7 43 08 04 00 00 00 48 83 7b 18 00 75 21 31
[22508.977057] RSP: 002b:00007f94c9240138 EFLAGS: 00000202
[22508.977060] RAX: 0000000000000003 RBX: 00007f944fa4d000 RCX: 00007ffd6634ecea
[22508.977062] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[22508.977063] RBP: 00007f94c9240140 R08: 0000000000000000 R09: 0000000000000000
[22508.977065] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[22508.977066] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[22508.977071]  </TASK>
[22508.977072] ---[ end trace 9de918b900da7e99 ]---
[83464.000520] perf: interrupt took too long (3941 > 3936), lowering kernel.perf_event_max_sample_rate to 50750

I have reported to Intel, it likely triggered by a pattern of workload, we will do more investigate in next week.

But again, Intel promising official 22.04 support in Q3, for safety, ensure knernal version <= 5.13

jasl commented 2 years ago

Confirmed this is a bug in 5.15 kernal, fortunately this is identified and has fixed in 5.19-rc9.

I've asked Intel to make the patch backport to 5.15 kernal.

jasl commented 2 years ago

Now we have to wait Jammy v5.15.45 upstream stable release https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1981862/

x86/sgx: Fix race between reclaimer and page fault handler