NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.7k stars 13.84k forks source link

Writes stall using LUKS on Atom 3xxx Processor #40282

Open red-hood opened 6 years ago

red-hood commented 6 years ago

Issue description

When using either the 4.14 or 4.15 kernel from 18.03 with LUKS on a S-ATA SSD-Drive, writes to the crypt block device stall, the process issuing the write stays in the syscall forever. The kernel log shows the following entries:

[  369.628274] INFO: task kworker/u9:0:1151 blocked for more than 120 seconds.
[  369.628326]       Not tainted 4.15.18 #1-NixOS
[  369.628352] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.628396] kworker/u9:0    D    0  1151      2 0x80000000
[  369.628407] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  369.628409] Call Trace:
[  369.628418]  ? __schedule+0x1a8/0x6d0
[  369.628424]  ? qat_alg_sgl_to_bufl.isra.10+0x4df/0x910 [intel_qat]
[  369.628427]  schedule+0x28/0x80
[  369.628429]  schedule_timeout+0x1e3/0x350
[  369.628433]  ? unmap_single.part.14+0x10/0x10
[  369.628437]  ? qat_alg_free_bufl.isra.9+0x187/0x1e0 [intel_qat]
[  369.628439]  wait_for_completion+0xb0/0x120
[  369.628443]  ? wake_up_q+0x70/0x70
[  369.628445]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  369.628450]  ? bio_alloc_bioset+0x9a/0x200
[  369.628452]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  369.628455]  process_one_work+0x1da/0x3d0
[  369.628458]  worker_thread+0x2b/0x3f0
[  369.628460]  ? process_one_work+0x3d0/0x3d0
[  369.628462]  kthread+0x113/0x130
[  369.628465]  ? kthread_create_worker_on_cpu+0x50/0x50
[  369.628467]  ret_from_fork+0x35/0x40
[  369.628471] INFO: task kworker/u9:1:1157 blocked for more than 120 seconds.
[  369.628511]       Not tainted 4.15.18 #1-NixOS
[  369.628536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.628579] kworker/u9:1    D    0  1157      2 0x80000000
[  369.628583] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  369.628584] Call Trace:
[  369.628588]  ? __schedule+0x1a8/0x6d0
[  369.628592]  ? qat_alg_sgl_to_bufl.isra.10+0x4df/0x910 [intel_qat]
[  369.628594]  schedule+0x28/0x80
[  369.628596]  schedule_timeout+0x1e3/0x350
[  369.628598]  ? unmap_single.part.14+0x10/0x10
[  369.628602]  ? qat_alg_free_bufl.isra.9+0x187/0x1e0 [intel_qat]
[  369.628604]  wait_for_completion+0xb0/0x120
[  369.628607]  ? wake_up_q+0x70/0x70
[  369.628609]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  369.628612]  ? bio_alloc_bioset+0x9a/0x200
[  369.628615]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  369.628617]  process_one_work+0x1da/0x3d0
[  369.628619]  worker_thread+0x2b/0x3f0
[  369.628621]  ? process_one_work+0x3d0/0x3d0
[  369.628622]  kthread+0x113/0x130
[  369.628625]  ? kthread_create_worker_on_cpu+0x50/0x50
[  369.628627]  ? SyS_exit_group+0x10/0x10
[  369.628629]  ret_from_fork+0x35/0x40
[  369.628631] INFO: task kworker/u9:2:1158 blocked for more than 120 seconds.
[  369.628670]       Not tainted 4.15.18 #1-NixOS
[  369.628696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.628739] kworker/u9:2    D    0  1158      2 0x80000000
[  369.628743] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  369.628744] Call Trace:
[  369.628747]  ? __schedule+0x1a8/0x6d0
[  369.628751]  ? qat_alg_sgl_to_bufl.isra.10+0x4df/0x910 [intel_qat]
[  369.628753]  schedule+0x28/0x80
[  369.628755]  schedule_timeout+0x1e3/0x350
[  369.628757]  ? unmap_single.part.14+0x10/0x10
[  369.628761]  ? qat_alg_free_bufl.isra.9+0x187/0x1e0 [intel_qat]
[  369.628763]  wait_for_completion+0xb0/0x120
[  369.628766]  ? wake_up_q+0x70/0x70
[  369.628768]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  369.628771]  ? bio_alloc_bioset+0x9a/0x200
[  369.628774]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  369.628776]  process_one_work+0x1da/0x3d0
[  369.628778]  worker_thread+0x2b/0x3f0
[  369.628779]  ? process_one_work+0x3d0/0x3d0
[  369.628781]  kthread+0x113/0x130
[  369.628783]  ? kthread_create_worker_on_cpu+0x50/0x50
[  369.628785]  ? SyS_exit_group+0x10/0x10
[  369.628787]  ret_from_fork+0x35/0x40
[  369.628789] INFO: task kworker/u9:3:1159 blocked for more than 120 seconds.
[  369.628828]       Not tainted 4.15.18 #1-NixOS
[  369.628853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.628896] kworker/u9:3    D    0  1159      2 0x80000000
[  369.628900] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  369.628901] Call Trace:
[  369.628904]  ? __schedule+0x1a8/0x6d0
[  369.628908]  ? qat_alg_sgl_to_bufl.isra.10+0x4df/0x910 [intel_qat]
[  369.628910]  schedule+0x28/0x80
[  369.628912]  schedule_timeout+0x1e3/0x350
[  369.628915]  ? unmap_single.part.14+0x10/0x10
[  369.628918]  ? qat_alg_free_bufl.isra.9+0x187/0x1e0 [intel_qat]
[  369.628920]  wait_for_completion+0xb0/0x120
[  369.628923]  ? wake_up_q+0x70/0x70
[  369.628925]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  369.628928]  ? bio_alloc_bioset+0x9a/0x200
[  369.628931]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  369.628933]  process_one_work+0x1da/0x3d0
[  369.628935]  worker_thread+0x2b/0x3f0
[  369.628937]  ? process_one_work+0x3d0/0x3d0
[  369.628938]  kthread+0x113/0x130
[  369.628941]  ? kthread_create_worker_on_cpu+0x50/0x50
[  369.628942]  ret_from_fork+0x35/0x40
[  369.628946] INFO: task dd:1174 blocked for more than 120 seconds.
[  369.628980]       Not tainted 4.15.18 #1-NixOS
[  369.629005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.629048] dd              D    0  1174   1162 0x00000000
[  369.629050] Call Trace:
[  369.629053]  ? __schedule+0x1a8/0x6d0
[  369.629056]  schedule+0x28/0x80
[  369.629058]  io_schedule+0x12/0x40
[  369.629062]  wait_on_page_bit_common+0xd1/0x140
[  369.629065]  ? page_cache_tree_insert+0xb0/0xb0
[  369.629067]  __filemap_fdatawait_range+0xc6/0x110
[  369.629070]  filemap_write_and_wait+0x42/0x70
[  369.629074]  __blkdev_put+0x69/0x1f0
[  369.629076]  blkdev_close+0x21/0x30
[  369.629079]  __fput+0xd0/0x1e0
[  369.629082]  task_work_run+0x8a/0xb0
[  369.629085]  exit_to_usermode_loop+0x9e/0xa0
[  369.629087]  do_syscall_64+0x104/0x120
[  369.629089]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  369.629092] RIP: 0033:0x7f353738bb74
[  369.629094] RSP: 002b:00007ffe8ae08da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  369.629096] RAX: 0000000000000000 RBX: 00007f3537dd01c0 RCX: 00007f353738bb74
[  369.629097] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[  369.629098] RBP: 0000000000000064 R08: ffffffffffffffff R09: 0000000000000000
[  369.629099] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000064
[  369.629100] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f3537ccd000
[  369.629103] INFO: task systemd-udevd:1176 blocked for more than 120 seconds.
[  369.629142]       Not tainted 4.15.18 #1-NixOS
[  369.629167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.629221] systemd-udevd   D    0  1176    601 0x00000104
[  369.629224] Call Trace:
[  369.629228]  ? __schedule+0x1a8/0x6d0
[  369.629230]  schedule+0x28/0x80
[  369.629232]  schedule_preempt_disabled+0xa/0x10
[  369.629235]  __mutex_lock.isra.2+0x17d/0x4b0
[  369.629239]  ? __blkdev_get+0x5e/0x490
[  369.629241]  __blkdev_get+0x5e/0x490
[  369.629243]  blkdev_get+0x10a/0x2e0
[  369.629245]  ? bd_acquire+0x91/0xd0
[  369.629247]  ? bd_acquire+0xd0/0xd0
[  369.629250]  do_dentry_open+0x1b3/0x2d0
[  369.629253]  path_openat+0x637/0x1630
[  369.629258]  ? __bpf_prog_run32+0x23/0x30
[  369.629261]  ? page_add_file_rmap+0xd6/0x160
[  369.629263]  do_filp_open+0x8c/0xf0
[  369.629268]  ? __alloc_fd+0x44/0x170
[  369.629270]  ? do_sys_open+0x1a6/0x230
[  369.629272]  do_sys_open+0x1a6/0x230
[  369.629275]  do_syscall_64+0x67/0x120
[  369.629277]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  369.629279] RIP: 0033:0x7f6636a553b0
[  369.629280] RSP: 002b:00007fff6369dbe0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  369.629282] RAX: ffffffffffffffda RBX: 000055f52633e2b0 RCX: 00007f6636a553b0
[  369.629283] RDX: 0000000000080000 RSI: 000055f52634c2b0 RDI: ffffffffffffff9c
[  369.629284] RBP: 0000000000000001 R08: 00007f6637441fa9 R09: 000000000000000f
[  369.629285] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  369.629286] R13: 0000000000000000 R14: 000055f5263541d0 R15: 00000000ffffffff

For the 4.14 kernel:

[  492.504397] INFO: task kworker/u9:1:1191 blocked for more than 120 seconds.
[  492.506046]       Not tainted 4.14.39 #1-NixOS
[  492.507681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.509349] kworker/u9:1    D    0  1191      2 0x80000000
[  492.509364] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  492.509366] Call Trace:
[  492.509377]  ? __schedule+0x1a2/0x6c0
[  492.509379]  schedule+0x28/0x80
[  492.509382]  schedule_timeout+0x1e7/0x340
[  492.509386]  ? swiotlb_max_segment+0x10/0x10
[  492.509392]  ? qat_alg_sgl_to_bufl.isra.10+0x66d/0x910 [intel_qat]
[  492.509395]  ? wait_for_completion+0xb0/0x120
[  492.509397]  wait_for_completion+0xb0/0x120
[  492.509401]  ? wake_up_q+0x70/0x70
[  492.509404]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  492.509408]  ? bio_alloc_bioset+0x9a/0x200
[  492.509410]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  492.509414]  process_one_work+0x1da/0x3d0
[  492.509417]  worker_thread+0x2b/0x3f0
[  492.509419]  ? process_one_work+0x3d0/0x3d0
[  492.509421]  kthread+0x11a/0x130
[  492.509423]  ? kthread_create_on_node+0x40/0x40
[  492.509426]  ret_from_fork+0x35/0x40
[  492.509430] INFO: task kworker/u9:2:1192 blocked for more than 120 seconds.
[  492.511100]       Not tainted 4.14.39 #1-NixOS
[  492.512757] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.514441] kworker/u9:2    D    0  1192      2 0x80000000
[  492.514455] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  492.514457] Call Trace:
[  492.514466]  ? __schedule+0x1a2/0x6c0
[  492.514470]  ? __slab_alloc+0x1c/0x30
[  492.514472]  schedule+0x28/0x80
[  492.514474]  schedule_timeout+0x1e7/0x340
[  492.514481]  ? swiotlb_max_segment+0x10/0x10
[  492.514486]  ? qat_alg_sgl_to_bufl.isra.10+0x66d/0x910 [intel_qat]
[  492.514489]  ? wait_for_completion+0xb0/0x120
[  492.514491]  wait_for_completion+0xb0/0x120
[  492.514495]  ? wake_up_q+0x70/0x70
[  492.514498]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  492.514502]  ? bio_alloc_bioset+0x9a/0x200
[  492.514505]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  492.514508]  process_one_work+0x1da/0x3d0
[  492.514511]  worker_thread+0x2b/0x3f0
[  492.514513]  ? process_one_work+0x3d0/0x3d0
[  492.514515]  kthread+0x11a/0x130
[  492.514518]  ? kthread_create_on_node+0x40/0x40
[  492.514520]  ret_from_fork+0x35/0x40
[  492.514524] INFO: task kworker/u9:3:1193 blocked for more than 120 seconds.
[  492.516214]       Not tainted 4.14.39 #1-NixOS
[  492.517879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.519566] kworker/u9:3    D    0  1193      2 0x80000000
[  492.519580] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  492.519581] Call Trace:
[  492.519591]  ? __schedule+0x1a2/0x6c0
[  492.519593]  schedule+0x28/0x80
[  492.519596]  schedule_timeout+0x1e7/0x340
[  492.519600]  ? swiotlb_max_segment+0x10/0x10
[  492.519608]  ? qat_alg_sgl_to_bufl.isra.10+0x66d/0x910 [intel_qat]
[  492.519611]  ? wait_for_completion+0xb0/0x120
[  492.519613]  wait_for_completion+0xb0/0x120
[  492.519617]  ? wake_up_q+0x70/0x70
[  492.519619]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  492.519624]  ? bio_alloc_bioset+0x9a/0x200
[  492.519626]  ? check_preempt_wakeup+0x140/0x220
[  492.519628]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  492.519632]  process_one_work+0x1da/0x3d0
[  492.519635]  worker_thread+0x2b/0x3f0
[  492.519637]  ? process_one_work+0x3d0/0x3d0
[  492.519639]  kthread+0x11a/0x130
[  492.519641]  ? kthread_create_on_node+0x40/0x40
[  492.519644]  ret_from_fork+0x35/0x40
[  492.519649] INFO: task dd:1224 blocked for more than 120 seconds.
[  492.521352]       Not tainted 4.14.39 #1-NixOS
[  492.523050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.524789] dd              D    0  1224   1107 0x00000004
[  492.524794] Call Trace:
[  492.524805]  ? __schedule+0x1a2/0x6c0
[  492.524808]  schedule+0x28/0x80
[  492.524812]  io_schedule+0x12/0x40
[  492.524816]  wait_on_page_bit_common+0xd1/0x140
[  492.524818]  ? page_cache_tree_insert+0xa0/0xa0
[  492.524821]  __filemap_fdatawait_range+0xe7/0x130
[  492.524825]  filemap_write_and_wait+0x42/0x70
[  492.524829]  __blkdev_put+0x69/0x1f0
[  492.524832]  blkdev_close+0x21/0x30
[  492.524835]  __fput+0xd0/0x1e0
[  492.524838]  task_work_run+0x8a/0xb0
[  492.524841]  exit_to_usermode_loop+0x9e/0xa0
[  492.524844]  do_syscall_64+0x104/0x120
[  492.524847]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  492.524851] RIP: 0033:0x7f14bf58fb71
[  492.524852] RSP: 002b:00007ffd846f8fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  492.524854] RAX: 0000000000000000 RBX: 00007f14bffd61c0 RCX: 00007f14bf58fb71
[  492.524856] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[  492.524857] RBP: 00000000000003e8 R08: 00000000ffffffff R09: 0000000000000000
[  492.524858] R10: 0000000000000022 R11: 0000000000000246 R12: 00000000000003e8
[  492.524859] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f14bfed3000
[  492.524863] INFO: task systemd-udevd:1225 blocked for more than 120 seconds.
[  492.526593]       Not tainted 4.14.39 #1-NixOS
[  492.528289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.529972] systemd-udevd   D    0  1225    599 0x00000100
[  492.529976] Call Trace:
[  492.529987]  ? __schedule+0x1a2/0x6c0
[  492.529990]  schedule+0x28/0x80
[  492.529994]  schedule_preempt_disabled+0xa/0x10
[  492.529996]  __mutex_lock.isra.2+0x17d/0x4b0
[  492.530002]  ? __blkdev_get+0x5b/0x410
[  492.530004]  __blkdev_get+0x5b/0x410
[  492.530010]  blkdev_get+0x10a/0x2e0
[  492.530012]  ? bd_acquire+0x91/0xd0
[  492.530014]  ? bd_acquire+0xd0/0xd0
[  492.530017]  do_dentry_open+0x1b0/0x2d0
[  492.530021]  ? __inode_permission+0x85/0xc0
[  492.530023]  path_openat+0x637/0x1630
[  492.530027]  ? __bpf_prog_run32+0x23/0x30
[  492.530030]  do_filp_open+0x8c/0xf0
[  492.530034]  ? __alloc_fd+0x44/0x170
[  492.530036]  ? do_sys_open+0x1a6/0x230
[  492.530038]  do_sys_open+0x1a6/0x230
[  492.530041]  do_syscall_64+0x67/0x120
[  492.530045]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  492.530048] RIP: 0033:0x7fd40fe8e3fd
[  492.530049] RSP: 002b:00007ffeab739210 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  492.530051] RAX: ffffffffffffffda RBX: 000055dc3a4f2200 RCX: 00007fd40fe8e3fd
[  492.530052] RDX: 0000000000080000 RSI: 000055dc3a4f7e10 RDI: 00000000ffffff9c
[  492.530054] RBP: 0000000000000001 R08: 00007fd410887bc9 R09: 000000000000000f
[  492.530055] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  492.530056] R13: 00007ffeab7392d8 R14: 0000000000000000 R15: 0000000000000000
[  492.530059] INFO: task kworker/u9:4:1226 blocked for more than 120 seconds.
[  492.531728]       Not tainted 4.14.39 #1-NixOS
[  492.533370] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.535027] kworker/u9:4    D    0  1226      2 0x80000000
[  492.535040] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  492.535042] Call Trace:
[  492.535051]  ? __schedule+0x1a2/0x6c0
[  492.535055]  ? __slab_alloc+0x1c/0x30
[  492.535057]  schedule+0x28/0x80
[  492.535059]  schedule_timeout+0x1e7/0x340
[  492.535066]  ? swiotlb_max_segment+0x10/0x10
[  492.535071]  ? qat_alg_sgl_to_bufl.isra.10+0x66d/0x910 [intel_qat]
[  492.535074]  ? wait_for_completion+0xb0/0x120
[  492.535076]  wait_for_completion+0xb0/0x120
[  492.535080]  ? wake_up_q+0x70/0x70
[  492.535083]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  492.535087]  ? bio_alloc_bioset+0x9a/0x200
[  492.535090]  ? pick_next_task_fair+0x53/0x490
[  492.535092]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  492.535096]  process_one_work+0x1da/0x3d0
[  492.535099]  worker_thread+0x2b/0x3f0
[  492.535101]  ? process_one_work+0x3d0/0x3d0
[  492.535103]  kthread+0x11a/0x130
[  492.535106]  ? kthread_create_on_node+0x40/0x40
[  492.535109]  ? SyS_exit_group+0x10/0x10
[  492.535112]  ret_from_fork+0x35/0x40
[  615.381287] INFO: task kworker/u9:1:1191 blocked for more than 120 seconds.
[  615.382932]       Not tainted 4.14.39 #1-NixOS
[  615.384546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.386176] kworker/u9:1    D    0  1191      2 0x80000000
[  615.386190] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  615.386197] Call Trace:
[  615.386208]  ? __schedule+0x1a2/0x6c0
[  615.386210]  schedule+0x28/0x80
[  615.386213]  schedule_timeout+0x1e7/0x340
[  615.386219]  ? swiotlb_max_segment+0x10/0x10
[  615.386225]  ? qat_alg_sgl_to_bufl.isra.10+0x66d/0x910 [intel_qat]
[  615.386228]  ? wait_for_completion+0xb0/0x120
[  615.386230]  wait_for_completion+0xb0/0x120
[  615.386234]  ? wake_up_q+0x70/0x70
[  615.386237]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  615.386241]  ? bio_alloc_bioset+0x9a/0x200
[  615.386243]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  615.386247]  process_one_work+0x1da/0x3d0
[  615.386250]  worker_thread+0x2b/0x3f0
[  615.386253]  ? process_one_work+0x3d0/0x3d0
[  615.386254]  kthread+0x11a/0x130
[  615.386257]  ? kthread_create_on_node+0x40/0x40
[  615.386259]  ret_from_fork+0x35/0x40
[  615.386264] INFO: task kworker/u9:2:1192 blocked for more than 120 seconds.
[  615.387870]       Not tainted 4.14.39 #1-NixOS
[  615.389467] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.391105] kworker/u9:2    D    0  1192      2 0x80000000
[  615.391118] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  615.391120] Call Trace:
[  615.391130]  ? __schedule+0x1a2/0x6c0
[  615.391133]  ? __slab_alloc+0x1c/0x30
[  615.391136]  schedule+0x28/0x80
[  615.391141]  schedule_timeout+0x1e7/0x340
[  615.391145]  ? swiotlb_max_segment+0x10/0x10
[  615.391150]  ? qat_alg_sgl_to_bufl.isra.10+0x66d/0x910 [intel_qat]
[  615.391153]  ? wait_for_completion+0xb0/0x120
[  615.391155]  wait_for_completion+0xb0/0x120
[  615.391159]  ? wake_up_q+0x70/0x70
[  615.391162]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  615.391166]  ? bio_alloc_bioset+0x9a/0x200
[  615.391168]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  615.391172]  process_one_work+0x1da/0x3d0
[  615.391175]  worker_thread+0x2b/0x3f0
[  615.391177]  ? process_one_work+0x3d0/0x3d0
[  615.391179]  kthread+0x11a/0x130
[  615.391181]  ? kthread_create_on_node+0x40/0x40
[  615.391184]  ret_from_fork+0x35/0x40
[  615.391188] INFO: task kworker/u9:3:1193 blocked for more than 120 seconds.
[  615.392822]       Not tainted 4.14.39 #1-NixOS
[  615.394426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.396057] kworker/u9:3    D    0  1193      2 0x80000000
[  615.396069] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  615.396071] Call Trace:
[  615.396080]  ? __schedule+0x1a2/0x6c0
[  615.396083]  schedule+0x28/0x80
[  615.396085]  schedule_timeout+0x1e7/0x340
[  615.396090]  ? swiotlb_max_segment+0x10/0x10
[  615.396098]  ? qat_alg_sgl_to_bufl.isra.10+0x66d/0x910 [intel_qat]
[  615.396100]  ? wait_for_completion+0xb0/0x120
[  615.396103]  wait_for_completion+0xb0/0x120
[  615.396106]  ? wake_up_q+0x70/0x70
[  615.396109]  crypt_convert+0xc5c/0xee0 [dm_crypt]
[  615.396113]  ? bio_alloc_bioset+0x9a/0x200
[  615.396115]  ? check_preempt_wakeup+0x140/0x220
[  615.396117]  kcryptd_crypt+0x1f7/0x350 [dm_crypt]
[  615.396121]  process_one_work+0x1da/0x3d0
[  615.396124]  worker_thread+0x2b/0x3f0
[  615.396126]  ? process_one_work+0x3d0/0x3d0
[  615.396128]  kthread+0x11a/0x130
[  615.396130]  ? kthread_create_on_node+0x40/0x40
[  615.396132]  ret_from_fork+0x35/0x40
[  615.396137] INFO: task dd:1224 blocked for more than 120 seconds.
[  615.397780]       Not tainted 4.14.39 #1-NixOS
[  615.399418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.401086] dd              D    0  1224   1107 0x00000004
[  615.401091] Call Trace:
[  615.401101]  ? __schedule+0x1a2/0x6c0
[  615.401103]  schedule+0x28/0x80
[  615.401107]  io_schedule+0x12/0x40
[  615.401111]  wait_on_page_bit_common+0xd1/0x140
[  615.401114]  ? page_cache_tree_insert+0xa0/0xa0
[  615.401116]  __filemap_fdatawait_range+0xe7/0x130
[  615.401119]  filemap_write_and_wait+0x42/0x70
[  615.401123]  __blkdev_put+0x69/0x1f0
[  615.401128]  blkdev_close+0x21/0x30
[  615.401131]  __fput+0xd0/0x1e0
[  615.401134]  task_work_run+0x8a/0xb0
[  615.401138]  exit_to_usermode_loop+0x9e/0xa0
[  615.401140]  do_syscall_64+0x104/0x120
[  615.401144]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  615.401147] RIP: 0033:0x7f14bf58fb71
[  615.401148] RSP: 002b:00007ffd846f8fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[  615.401151] RAX: 0000000000000000 RBX: 00007f14bffd61c0 RCX: 00007f14bf58fb71
[  615.401152] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[  615.401153] RBP: 00000000000003e8 R08: 00000000ffffffff R09: 0000000000000000
[  615.401154] R10: 0000000000000022 R11: 0000000000000246 R12: 00000000000003e8
[  615.401155] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f14bfed3000

Steps to reproduce

  1. Create a (GPT) partition on the SSD drive.
  2. Format as LUKS:
    crypsetup luksFormat /dev/sdaX
  3. Open the LUKS volume:
    cryptsetup luksOpen /dev/sdaX crypttest
  4. Write some data onto the device:
    dd if=/dev/zero of=/dev/mapper/crypttest bs=1M count=100

The problem occured originally when creating a BTRFS file system on the device, I just used dd to rule out any file system influences.

Technical details

Please run nix-shell -p nix-info --run "nix-info -m" and paste the results.

I am currently running Ubuntu 18.04 on the machine (due to its native ZFS support), which works fine. I will run the info command once I am testing new NixOS kernels again.

Some hardware information

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               95
Model name:          Intel(R) Atom(TM) CPU C3558 @ 2.20GHz
Stepping:            1
CPU MHz:             2200.646
CPU max MHz:         2200.0000
CPU min MHz:         800.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           24K
L1i cache:           32K
L2 cache:            2048K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves ibpb ibrs stibp dtherm arat pln pts arch_capabilities
/dev/sda:

ATA device, with non-removable media
    Model Number:       SAMSUNG MZNLN256HCHP-000L7              
    Serial Number:      S20GNAAGC27797      
    Firmware Revision:  EMT22L6Q
    Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
    Used: unknown (minor revision code 0x0039) 
    Supported: 9 8 7 6 5 
    Likely used: 9
Configuration:
    Logical     max current
    cylinders   16383   16383
    heads       16  16
    sectors/track   63  63
    --
    CHS current addressable sectors:    16514064
    LBA    user addressable sectors:   268435455
    LBA48  user addressable sectors:   500118192
    Logical  Sector size:                   512 bytes
    Physical Sector size:                   512 bytes
    Logical Sector-0 offset:                  0 bytes
    device size with M = 1024*1024:      244198 MBytes
    device size with M = 1000*1000:      256060 MBytes (256 GB)
    cache/buffer size  = unknown
    Nominal Media Rotation Rate: Solid State Device
Capabilities:
    LBA, IORDY(can be disabled)
    Queue depth: 32
    Standby timer values: spec'd by Standard, no device specific minimum
    R/W multiple sector transfer: Max = 1   Current = 1
    DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
         Cycle time: min=120ns recommended=120ns
    PIO: pio0 pio1 pio2 pio3 pio4 
         Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
    Enabled Supported:
       *    SMART feature set
            Security Mode feature set
       *    Power Management feature set
       *    Write cache
       *    Look-ahead
       *    Host Protected Area feature set
       *    WRITE_BUFFER command
       *    READ_BUFFER command
       *    NOP cmd
       *    DOWNLOAD_MICROCODE
            SET_MAX security extension
       *    48-bit Address feature set
       *    Device Configuration Overlay feature set
       *    Mandatory FLUSH_CACHE
       *    FLUSH_CACHE_EXT
       *    SMART error logging
       *    SMART self-test
       *    General Purpose Logging feature set
       *    WRITE_{DMA|MULTIPLE}_FUA_EXT
       *    64-bit World wide name
            Write-Read-Verify feature set
       *    WRITE_UNCORRECTABLE_EXT command
       *    {READ,WRITE}_DMA_EXT_GPL commands
       *    Segmented DOWNLOAD_MICROCODE
       *    Gen1 signaling speed (1.5Gb/s)
       *    Gen2 signaling speed (3.0Gb/s)
       *    Gen3 signaling speed (6.0Gb/s)
       *    Native Command Queueing (NCQ)
       *    Phy event counters
       *    READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
       *    DMA Setup Auto-Activate optimization
            Device-initiated interface power management
       *    Asynchronous notification (eg. media change)
       *    Software settings preservation
            Device Sleep (DEVSLP)
       *    SMART Command Transport (SCT) feature set
       *    SCT Write Same (AC2)
       *    SCT Error Recovery Control (AC3)
       *    SCT Features Control (AC4)
       *    SCT Data Tables (AC5)
       *    reserved 69[4]
       *    DOWNLOAD MICROCODE DMA command
       *    SET MAX SETPASSWORD/UNLOCK DMA commands
       *    WRITE BUFFER DMA command
       *    READ BUFFER DMA command
       *    Data Set Management TRIM supported (limit 8 blocks)
Security: 
    Master password revision code = 65534
        supported
    not enabled
    not locked
    not frozen
    not expired: security count
        supported: enhanced erase
    2min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5002538d00000000
    NAA     : 5
    IEEE OUI    : 002538
    Unique ID   : d00000000
Device Sleep:
    DEVSLP Exit Timeout (DETO): 50 ms (drive)
    Minimum DEVSLP Assertion Time (MDAT): 30 ms (drive)
Checksum: correct

Kernel versions were:

Linux nixos 4.15.18 #1-NixOS SMP Thu Apr 19 06:55:13 UTC 2018 x86_64 GNU/Linux
uname -r
4.14.39

The Ubuntu kernel, which is running fine with LUKS on this machine is:

Linux ubuntu 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
andir commented 5 years ago

I can reproduce the issue with:

lscpu:

lscpu                                                                                        /home/andi
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  1
Core(s) per socket:  12
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               95
Model name:          Intel(R) Atom(TM) CPU C3858 @ 2.00GHz
Stepping:            1
CPU MHz:             1999.981
CPU max MHz:         2000.0000
CPU min MHz:         800.0000
BogoMIPS:            4000.00
Virtualization:      VT-x
L1d cache:           24K
L1i cache:           32K
L2 cache:            2048K
NUMA node0 CPU(s):   0-11
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts arch_capabilities

Using a real SSD, HDD, NVMe Disk or just a file in tmpfs as backing device of the encrypted volume the same issues appears:

root@nixos> dd if=/dev/zero of=/tmp/test.disk bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.3424 s, 781 MB/s
root@nixos> cryptsetup luksFormat /tmp/test.disk

WARNING!
========
This will overwrite data on /tmp/test.disk irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for /tmp/test.disk:
Verify passphrase:
root@nixos> cryptsetup open /tmp/test.disk crypttest                                                     /home/andi
Enter passphrase for /tmp/test.disk:
root@nixos> dd if=/dev/zero of=/dev/mapper/crypttest bs=1M count=500
[HANG]

Same happens on an all tmpfs NixOS 18.09 LiveCD.

andir commented 5 years ago

I can also confirm that the very same procedure does not happen on a current Ubuntu 18.04 LiveCD (ubuntu-18.04.1-live-server-amd64.iso) with a 4.15.0-29-generic kernel.

The kernel config that was found in /boot can be seen at https://gist.github.com/andir/c9f3fd32675179ef60dd9e0ba9512a66.

andir commented 5 years ago

There is an issue with Fedora regarding this since about a year https://bugzilla.redhat.com/show_bug.cgi?id=1522962

No solution was found but the issue description seems to match this issue very much.

andir commented 5 years ago

When blacklisting the QAT modules as hinted by the linked bugzilla report the kernel no longer stalls when encrypting data.

  boot.blacklistedKernelModules = [ "intel_qat" "qat_c3xxx" ];
andir commented 5 years ago

According to https://01.org/sites/default/files/downloads/intelr-quickassist-technology/336211-008qatrelnotes.pdf section 3.1.2: 2018-11-28-224847_850x422_scrot

So using this module for any kind of dm-crypt related work is probably a bad idea. We should blacklist / remove it per default…

antondollmaier commented 5 years ago

I'd like to thank you for reporting and documenting this issue, although I don't have any connection to NixOS.

After acquiring a new main board with an Intel Atom CPU like yours, writes were stalled for me as well. It took me two days to identify the board as root cause and five minutes to find this bug report. Now, with the blacklisted QAT modules, dm-crypt fully works again.

Thanks again!

(If you don't like this offtopic comment on this issue, feel free to remove it. I just wanted to say thanks)

Enteee commented 4 years ago

Quick update:

stale[bot] commented 4 years ago

Hello, I'm a bot and I thank you in the name of the community for opening this issue.

To help our human contributors focus on the most-relevant reports, I check up on old issues to see if they're still relevant. This issue has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

The community would appreciate your effort in checking if the issue is still valid. If it isn't, please close it.

If the issue persists, and you'd like to remove the stale label, you simply need to leave a comment. Your comment can be as simple as "still important to me". If you'd like it to get more attention, you can ask for help by searching for maintainers and people that previously touched related code and @ mention them in a comment. You can use Git blame or GitHub's web interface on the relevant files to find them.

Lastly, you can always ask for help at our Discourse Forum or at #nixos' IRC channel.