Open lishuai-ujs opened 3 years ago
so yeah, we never built or tested dattobd to work with anything but 4k pages. Our customers tend not to have equipment in that configuration.
I forget all the places where we make assumptions about the page size being 4k but you may also want to watch for cases where alignment problems come up. Like if you have a partition aligned to 63 sectors or raid stripes that overlap disks in a non 64k way, you may run into trouble there. Just something to look out for.
but off the top of my head, yeah if you make all of the 4k hardcoded things 64k, it seems like it should work. to play nice you might want to base the size changes on the PAGE_SIZE rather than hardcoding 16. Not that our hardcoding of 12 is any better. :-)
dattobd fails on system with pagesize=64k
After transition-to-incremental, kernel carsh. Messages as follows:
Jun 18 10:39:21 localhost kernel: [71355.704277] WARNING: CPU: 48 PID: 15140 at drivers/scsi/scsi_lib.c:1195 scsi_init_io+0x128/0x1b0 Jun 18 10:39:21 localhost kernel: [71355.713455] Modules linked in: dattobd(O) ip_set nfnetlink ib_isert iscsi_target_mod ib_srpt vfat fat target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad dm_multipath ipmi_ssif rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt ses enclosure hns_roce_hw_v2 hns_roce ib_uverbs ib_core ofpart cmdlinepart ipmi_si hi_sfc ipmi_devintf mtd ipmi_msghandler spi_dw_mmio sch_fq_codel realtek hclge hibmc_drm hns3 hisi_sas_v3_hw hnae3 megaraid_sas ttm hisi_sas_main host_edma_drv [last unloaded: dattobd] Jun 18 10:39:21 localhost kernel: [71355.768362] CPU: 48 PID: 15140 Comm: dbdctl Kdump: loaded Tainted: G O 4.19.90-21.2.ky10.aarch64 #1 Jun 18 10:39:21 localhost kernel: [71355.779091] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDGA, BIOS 1.38 07/04/2020 Jun 18 10:39:21 localhost kernel: [71355.787747] pstate: 20400009 (nzCv daif +PAN -UAO) Jun 18 10:39:21 localhost kernel: [71355.792949] pc : scsi_init_io+0x128/0x1b0 Jun 18 10:39:21 localhost kernel: [71355.797374] lr : sd_setup_read_write_cmnd+0x64/0x868 Jun 18 10:39:21 localhost kernel: [71355.892257] Call trace: Jun 18 10:39:21 localhost kernel: [71355.895124] scsi_init_io+0x128/0x1b0 Jun 18 10:39:21 localhost kernel: [71355.899201] sd_setup_read_write_cmnd+0x64/0x868 Jun 18 10:39:21 localhost kernel: [71355.904229] sd_init_command+0x1f0/0x478 Jun 18 10:39:21 localhost kernel: [71355.908567] scsi_setup_cmnd+0x78/0x140 Jun 18 10:39:21 localhost kernel: [71355.912817] scsi_queue_rq+0x4e0/0x678 Jun 18 10:39:21 localhost kernel: [71355.916985] blk_mq_dispatch_rq_list+0xa0/0x5f8 Jun 18 10:39:21 localhost kernel: [71355.921926] blk_mq_do_dispatch_sched+0x50/0xd8 Jun 18 10:39:21 localhost kernel: [71355.926867] blk_mq_sched_dispatch_requests+0x118/0x1f0 Jun 18 10:39:21 localhost kernel: [71355.932500] blk_mq_run_hw_queue+0x9c/0x120 Jun 18 10:39:21 localhost kernel: [71355.937270] blk_mq_delay_run_hw_queue+0x198/0x1d8 Jun 18 10:39:21 localhost kernel: [71355.942644] blk_mq_run_hw_queue+0x60/0x108 Jun 18 10:39:21 localhost kernel: [71355.947240] blk_mq_sched_insert_requests+0x9c/0x158 Jun 18 10:39:21 localhost kernel: [71355.952615] blk_mq_flush_plug_list+0x1a0/0x2d8 Jun 18 10:39:21 localhost kernel: [71355.957557] blk_flush_plug_list+0xd4/0x270 Jun 18 10:39:21 localhost kernel: [71355.962154] blk_finish_plug+0x40/0x50 Jun 18 10:39:21 localhost kernel: [71355.966321] _xfs_buf_ioapply+0x31c/0x3f8 Jun 18 10:39:21 localhost kernel: [71355.970745] __xfs_buf_submit+0xb0/0x250 Jun 18 10:39:21 localhost kernel: [71355.975087] xlog_bdstrat+0x40/0x88 Jun 18 10:39:21 localhost kernel: [71355.978991] xlog_sync+0x2c8/0x3e0 Jun 18 10:39:21 localhost kernel: [71355.982810] xlog_state_release_iclog+0x94/0xc0 Jun 18 10:39:21 localhost kernel: [71355.987751] xfs_log_force_lsn.isra.10+0x204/0x330 Jun 18 10:39:21 localhost kernel: [71355.993125] xfs_log_force_lsn+0xd8/0x190 Jun 18 10:39:21 localhost kernel: [71355.997549] xfs_trans_commit+0x2a8/0x388 Jun 18 10:39:21 localhost kernel: [71356.002145] xfs_trans_commit+0x24/0x30 Jun 18 10:39:21 localhost kernel: [71356.006396] xfs_sync_sb+0x68/0x78 Jun 18 10:39:21 localhost kernel: [71356.010214] xfs_log_sbcount+0x68/0x88 Jun 18 10:39:21 localhost kernel: [71356.014379] xfs_quiesce_attr+0x64/0xc8 Jun 18 10:39:21 localhost kernel: [71356.018629] xfs_fs_freeze+0x34/0x50 Jun 18 10:39:21 localhost kernel: [71356.022623] freeze_super+0xcc/0x1a8 Jun 18 10:39:21 localhost kernel: [71356.026618] freeze_bdev+0xf0/0xf8 Jun 18 10:39:21 localhost kernel: [71356.030439] tracer_transition_tracing+0x58/0x1c8 [dattobd] Jun 18 10:39:21 localhost kernel: [71356.036590] tracer_setup_tracing+0x98/0x138 [dattobd] Jun 18 10:39:21 localhost kernel: [71356.042309] ioctl_transition_inc+0x164/0x4a8 [dattobd] Jun 18 10:39:21 localhost kernel: [71356.047942] ctrl_ioctl+0x840/0xdb8 [dattobd] Jun 18 10:39:21 localhost kernel: [71356.052712] do_vfs_ioctl+0xb0/0x898 Jun 18 10:39:21 localhost kernel: [71356.056703] ksys_ioctl+0x8c/0xa0 Jun 18 10:39:21 localhost kernel: [71356.060433] __arm64_sys_ioctl+0x28/0x38 Jun 18 10:39:21 localhost kernel: [71356.064776] el0_svc_common+0x84/0x140 Jun 18 10:39:21 localhost kernel: [71356.068941] el0_svc_handler+0x80/0xa0 Jun 18 10:39:21 localhost kernel: [71356.076407] ---[ end trace 325eb1812a1633fc ]--- Jun 18 10:39:21 localhost kernel: [71356.081448] print_req_error: I/O error, dev sda, sector 10495688 Jun 18 10:39:21 localhost kernel: [71356.087862] datto: error reading from base device for copy on write: -5 Jun 18 10:39:21 localhost kernel: [71356.094880] datto: error during bio read complete callback: -5
I add a log to dattobd.c:
Jun 18 11:02:57 localhost kernel: [ 1256.454234] datto: start_sect 10495984, end_sect 10495992, pages 0
pagesize > COW_BLOCK_SIZE cause pagse=0, then cause crash.
follow patch works well. Any other better ideas?