Shard exl2 weights between ranks evenly by memory chunk sizes, not number of rows.
Model: turboderp_command-r-plus-103B-exl2_4.5bpw, tp=4
Before: 16.12 GiB on GPU0, 15.18 GiB on GPU1, 54048 max ctx
After: 15.21 GiB on GPU0, 15.20 GiB on GPU1, 67840 max ctx
Shard exl2 weights between ranks evenly by memory chunk sizes, not number of rows. Model: turboderp_command-r-plus-103B-exl2_4.5bpw, tp=4 Before: 16.12 GiB on GPU0, 15.18 GiB on GPU1, 54048 max ctx After: 15.21 GiB on GPU0, 15.20 GiB on GPU1, 67840 max ctx