ceph / ceph-iscsi

Ceph iSCSI tools
GNU General Public License v3.0
63 stars 59 forks source link

No Discard/Trim/Unmap ? #230

Open bbs2web opened 3 years ago

bbs2web commented 3 years ago

We got the iSCSI gateway working with Ceph Octopus but a Windows client sees the drive as a standard HDD so it won't trim.

The SUSE ceph-iscsi documentation has a myriad of options available, similar to how we're used to exporting functionality when running a small Debian VM which exports a RBD block device. I presume these are perhaps only available when the backstore is krbd instead of user:rbd (tcm-runner)? https://documentation.suse.com/ses/6/html/ses-all/cha-ceph-as-iscsi.html

I presume this functionality is exclusive to SUSE with their target_core_rbd module and that I may have initially miss interpreted the following discussion where I understood kernel 4.16+ to include the necessary plumbing: https://www.spinics.net/lists/ceph-users/msg53920.html

dillaman commented 3 years ago

The upstream Linux kernel does not contain SUSE's target_core_rbd and I am not aware of any other downstream kernels including it.

For tcmu-runner to denote that the LUN is non-rotational, you need your CRUSH map to have the pool's device class mapped to ssd or nvme device classes [1][2].

[1] https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L474 [2] https://ceph.io/community/new-luminous-crush-device-classes/

bbs2web commented 3 years ago

Thank you for this, I'll need to turn on debugging then as the device class for the pool is ssd. Wouldn't we want to however always present the discard option, even when the pool uses hdd, as we'll want to pass through commands to reclaim deleted space?

dillaman commented 3 years ago

We also set the UNMAP bit in the VPD inquiry along with all the alignment and max length hints. If Windows is keying off HDD vs SSD, though, that's a different issue.

If you have a Linux initiator connected, you can run sg_inq -p 0xB0 /path/to/device and see the block limits VPD (and code 0xB1 for the characteristics VPD).

bbs2web commented 3 years ago

Are there any ways one can override the auto detection, to set either solid state, thin provisioned or hard disk?

After temporarily replacing the RBD image which a straight up replicated ssd one, Windows now picks up either as being solid state. Both images reside in a replicated SSD pool but one has it's data stored in an erasure coded hdd pool which has a ssd cache tier.

I'll try to reproduce the issue, is there any additional diagnostic information I should collect if I'm able to reproduce?

Both images now work perfectly: image image

Space reclamation is also confirmed to be working perfectly, before and after running Windows Disk Defragmenter on the image that stores data in the erasure coded hdd pool:

[admin@kvm7e ~]# rbd du iscsi/vm-169-disk-2
NAME           PROVISIONED  USED
vm-169-disk-2      500 GiB  1.1 GiB
[admin@kvm7e ~]# rbd du iscsi/vm-169-disk-2
NAME           PROVISIONED  USED
vm-169-disk-2      500 GiB  132 MiB

More for my own record, should I be able to reproduce the previous behaviour. The following is when working correctly:

Debian 10 client:
[root@debian ~]# sg_inq -p 0xB0 /dev/sdb
VPD INQUIRY: Block limits page (SBC)
  Maximum compare and write length: 1 blocks
  Optimal transfer length granularity: 0 blocks [not reported]
  Maximum transfer length: 1024 blocks
  Optimal transfer length: 1024 blocks
  Maximum prefetch transfer length: 0 blocks [ignored]
  Maximum unmap LBA count: 32768
  Maximum unmap block descriptor count: 4
  Optimal unmap granularity: 8192 blocks
  Unmap granularity alignment valid: true
  Unmap granularity alignment: 0
  Maximum write same length: 0xffffffff blocks
  Maximum atomic transfer length: 0 blocks [not reported]
  Atomic alignment: 0 [unaligned atomic writes permitted]
  Atomic transfer length granularity: 0 [no granularity requirement
  Maximum atomic transfer length with atomic boundary: 0 blocks [not reported]
  Maximum atomic boundary size: 0 blocks [can only write atomic 1 block]
[root@debian sdb]# cat /sys/block/sdb/queue/logical_block_size;
512
[root@debian sdb]# cat /sys/block/sdb/queue/physical_block_size;
512
[root@debian sdb]# cat /sys/block/sdb/queue/hw_sector_size;
512
[root@debian sdb]# cat /sys/block/sdb/queue/rotational;
0
[root@debian sdb]# cat /sys/block/sdb/queue/discard_max_bytes;
16777216
[root@debian sdb]# cat /sys/block/sdb/queue/discard_max_hw_bytes;
16777216
[root@debian sdb]# cat /sys/block/sdb/queue/minimum_io_size;
512
[root@debian sdb]# cat /sys/block/sdb/queue/optimal_io_size;
524288
[root@debian sdb]# cat /sys/block/sdb/queue/discard_granularity;
4194304
[root@debian sdb]# cat /sys/block/sdb/discard_alignment;
0
[root@debian sdb]# cat /sys/block/sdb/queue/discard_zeroes_data;
0