Matrix Display output alignment is not Unicode-aware

nicholas-miklaucic commented 9 months ago

Hi,

Thanks for all the wonderful work on this library. I'm coming back after a Rust hiatus and I can't believe the const generics work as well as they do.

I'm using a custom scalar type that uses a combining overline character instead of a minus sign, to match the style of this reference work:

Somewhat ironically, a typesetting convention designed to make alignment easier in print breaks alignment for Matrix output using this custom type:

  ┌             ┐
  │  0 ̅1  0  0 │
  │  0 ̅1  0  0 │
  │  ⅔  0  1  0 │
  │  0  0  0  1 │
  └             ┘

The Display implementation here uses .char().count() instead of grapheme counting, which is what causes the misalignment seen here. I imagine this would also affect less frivolous custom Display impls, like Chinese characters.

unicode-segmentation is a large dependency to make a tiny edge case look nicer, so I'm not exactly saying this should be implemented differently, but I'd be interested if anyone else has had this issue or if there's a solution that doesn't require increasing the code size too massively.

Ralith commented 9 months ago

I imagine this would also affect less frivolous custom Display impls, like Chinese characters

Is there a reason to put Chinese characters in an nalgebra::Matrix?

nicholas-miklaucic commented 9 months ago

Some people may prefer to output numbers in Chinese numerals instead of the Arabic numeral system. A cursory glance at crates.io doesn't reveal any crates that have nalgebra and a crate for that formatting as a dependency, so I think that's not common enough to have made it on crates.io. I'd be interested to hear if anyone else has tried to use Unicode characters that break character-level counting before besides myself!

tpdickso commented 8 months ago

A potential solution would be to allow trait overloading for computing the length of a scalar in characters, at the level of the scalar. This would not help you if you were printing a f32 using fullwidth numerals to correctly align arabic numerals with hanzi, but would permit a custom scalar type to align itself using unicode-segmentation.

@Ralith I don't believe that Hanzi numerals are popular in mathematical work, but fullwidth numerals can be useful for alignment purposes; e.g. in terminals well-formatted CJK characters tend to take up two horizontal cells rather than one, so using fullwidth numerals can be useful.

However I think those will actually work fine, as it's a char-count and not a byte-count, so the fullwidth forms should be aligned correctly relative to one another, as long as they aren't being constructed using CJK combining characters (which are sadly not that well-supported anywhere, in my experience.) unicode-segmentation wouldn't actually help with aligning CJK with non-CJK characters either since CJK taking up 2 cells is a property of terminals, not unicode...

nicholas-miklaucic commented 8 months ago

Another use case, beyond mine, is computer algebra systems or other libraries that want to implement symbolic expressions. Perhaps that's more common than different languages.

You make an excellent point—allowing the custom types to control their display width eliminates my main concern with a fix for this, the large size of unicode-segmentation. It also allows correct behavior even when the alignment doesn't follow graphemes. I would be happy to implement this solution if people are interested.

The val_width function could be moved into a trait that inherits from Display with a default implementation as it's written. The API would have an additional trait, which is unfortunate if it's not widely used, but the source code would change very little. (I don't think it makes sense to have this apply to anything besides Display, and having a ton of separate traits for the other formatting traits makes little sense, so it would require some refactoring of the macro to cover the extra case.)

An important note is that fixing the issue on my end is more challenging than I originally thought. One solution I had considered was to add zero-width characters to the Display impl, so both positive and negative numbers had the additional spaces, but that only lets you pad more, not less. The result is pretty bad alignment, and it's quite brittle when you consider extensions like fractions or colored output.

dimforge / nalgebra

Matrix Display output alignment is not Unicode-aware #1354