Closed jszwedko closed 3 weeks ago
CC @griesemer
@jszwedko I think this is a reasonable suggestion, and in retrospect perhaps this would have been a better way to customize cell measurement than with the specific flags we have now. That said, I'm not sure how easy it would be to make this work given that part of the current implementation is using a state-machine like mechanism (but I haven't looked into it in detail).
Specifically, my concern is how such a function would interact or possibly interfere with what we have already.
Alternatively, an additional flag could be provided that would exclude escaped sequences of characters from the width computation. Or perhaps another escape character. Both these might be more in the current spirit of things even though perhaps less flexible than the function.
If you want to give it a shot and come up with a concrete implementation, please feel free to go ahead. I'm happy to review but will push back if the result doesn't fit nicely with what we have.
@griesemer yeah, looking more closely at the way the width is currently calculated does lead me to believe that it would be difficult to integrate a function like I suggested. I like the idea of a flag to exclude escaped characters, but it does feel like it should be a separate escape sequence to allow both features to be used concurrently. I'll mock that up and see what it looks like.
Thanks for the feedback!
This is somewhat related to the request to render CJK characters as two characters. In general, there are other classes of runes for which one would need to use alternative widths: fullwidth, modifiers, Jamo V+T, etc. If we make a change to the width handling, we should take these other cases in to account and allow supporting it. We could limit ourselves to determining width on a rune-by-rune basis.
The column width of runes could look something like this (based on http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, but slightly different):
Hello, everyone, I meet the same problem of printing CJK words, so I have to write a custom tabwriter (WeiZhang555/tabwriter). I read code of http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c before implementation, but I use a very similar Golang git repository "github.com/moznion/go-unicode-east-asian-width" instead and only changes little code: here https://github.com/WeiZhang555/tabwriter/blob/master/tabwriter.go#L408-L413 and here https://github.com/WeiZhang555/tabwriter/blob/master/tabwriter.go#L388-L404. You can regard this as a POC.
so the question is:
Changing the default is not an option, imo. CJK scripts are not the only one that vary in width. In fact, even the width ratios varies for CJK depending on the font used. For someone who's primary development environment is C, J, or K the 2:1 ratio is very likely. For users of Latin-oriented fixed-width fonts, for example, it is not. (I believe it is 5/3:1.) Furthermore, there are also characters that should be mapped to zero width and characters for which it is unclear to which width they should be mapped in general. Overall it is very hard, if not impossible, to come up with a single mapping that works across the board.
What I could imagine, though, is allowing tabwriter to have an optional interface that maps the length of an element or single character. It seems hard to extend the current NewWriter function to add this. I can imagine, though, adding a New function that takes an option argument:
New(w io.Writer, opts ...Option) *Writer
type Option ...
func Padding(n int) Option func TabWidth(width ...int) Option // would allow different widths per column func MinWidth(width ...int) Option func PadChar(r rune) Option
func WidthFunc(func(cell []byte) int) Option
Example:
tabwriter.New(w, tabwriter.TabWidth(30), tabwriter.WidthFunc(width.FixedWidthEastAsian))
The flags are, unfortunately, not typed, otherwise they could be options. This is the biggest problem with adopting this API, I think. Haven't given it much thought, though. In the worst case, this package could be copied into the text repo, but that would be lame. I rather not do that.
Something like that. That is quite an addition to the current package, even though it is only a new API wrapper, so that probably requires a proposal. WidthFunc is defined on the entire cell, instead of per rune, to be able to handle contextual sizes (such as Hangul rendering for decomposed Jamo in Korean).
Note that the golang.org/x/text/width package also has support for East-Asian width. This package could provide implementations of the interface for tabwriter. All the data is there. Similarly a package for Arabic Shaping could provide approximate widths for Arabic (don't actually know if there is such a thing as fixed-width Arabic).
On Mon, Jan 18, 2016 at 7:34 AM, zhangwei_cs notifications@github.com wrote:
Hello, everyone, I meet the same problem of printing CJK words, so I have to write a custom tabwriter (WeiZhang555/tabwriter https://github.com/WeiZhang555/tabwriter). It uses a very similar Golang git repository " github.com/moznion/go-unicode-east-asian-width" and only changes little code: here https://github.com/WeiZhang555/tabwriter/blob/master/tabwriter.go#L408-L413 and here https://github.com/WeiZhang555/tabwriter/blob/master/tabwriter.go#L388-L404. You can regard this as a POC.
so the question is:
- Do you think this is OK or not? I mean changing the default width calculation method?
- The proposal mentioned custom CellWidth calculation method, I think it can also give me some help, why you say this can't be implemented ?
— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12073#issuecomment-172439316.
Sounds really complicated, or even impossible! :-(
Of course the alternative is to add a WriteFunc member to Writer. This is somewhat messy and ties the implementation to a single way of doing things, but may be best, given the flag issue.
On Mon, Jan 18, 2016 at 10:25 AM, mpvl@golang.org wrote:
Changing the default is not an option, imo. CJK scripts are not the only one that vary in width. In fact, even the width ratios varies for CJK depending on the font used. For someone who's primary development environment is C, J, or K the 2:1 ratio is very likely. For users of Latin-oriented fixed-width fonts, for example, it is not. (I believe it is 5/3:1.) Furthermore, there are also characters that should be mapped to zero width and characters for which it is unclear to which width they should be mapped in general. Overall it is very hard, if not impossible, to come up with a single mapping that works across the board.
What I could imagine, though, is allowing tabwriter to have an optional interface that maps the length of an element or single character. It seems hard to extend the current NewWriter function to add this. I can imagine, though, adding a New function that takes an option argument:
New(w io.Writer, opts ...Option) *Writer
type Option ...
func Padding(n int) Option func TabWidth(width ...int) Option // would allow different widths per column func MinWidth(width ...int) Option func PadChar(r rune) Option
func WidthFunc(func(cell []byte) int) Option
Example:
tabwriter.New(w, tabwriter.TabWidth(30), tabwriter.WidthFunc(width.FixedWidthEastAsian))
The flags are, unfortunately, not typed, otherwise they could be options. This is the biggest problem with adopting this API, I think. Haven't given it much thought, though. In the worst case, this package could be copied into the text repo, but that would be lame. I rather not do that.
Something like that. That is quite an addition to the current package, even though it is only a new API wrapper, so that probably requires a proposal. WidthFunc is defined on the entire cell, instead of per rune, to be able to handle contextual sizes (such as Hangul rendering for decomposed Jamo in Korean).
Note that the golang.org/x/text/width package also has support for East-Asian width. This package could provide implementations of the interface for tabwriter. All the data is there. Similarly a package for Arabic Shaping could provide approximate widths for Arabic (don't actually know if there is such a thing as fixed-width Arabic).
On Mon, Jan 18, 2016 at 7:34 AM, zhangwei_cs notifications@github.com wrote:
Hello, everyone, I meet the same problem of printing CJK words, so I have to write a custom tabwriter (WeiZhang555/tabwriter https://github.com/WeiZhang555/tabwriter). It uses a very similar Golang git repository " github.com/moznion/go-unicode-east-asian-width" and only changes little code: here https://github.com/WeiZhang555/tabwriter/blob/master/tabwriter.go#L408-L413 and here https://github.com/WeiZhang555/tabwriter/blob/master/tabwriter.go#L388-L404. You can regard this as a POC.
so the question is:
- Do you think this is OK or not? I mean changing the default width calculation method?
- The proposal mentioned custom CellWidth calculation method, I think it can also give me some help, why you say this can't be implemented ?
— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12073#issuecomment-172439316.
The implementation is really not hard. The biggest problem is coming up with an acceptable API extension for tabwriter.
The second issue is to decide how the mapping looks like for fixed-width CJK, also taking into account modifiers, etc. Luckily people have thought about this and there are some good definitions that are easy to implement. The width package contains all (or almost all) data that is needed to implement this.
The problem with any fixed mapping, though, is that they don't work in all situations. That's why it shouldn't be tabwriter implementing this. Otherwise it is not hard. It just needs to be coordinated with some tabwriter API extension.
On Mon, Jan 18, 2016 at 10:48 AM, zhangwei_cs notifications@github.com wrote:
Sounds really complicated, or even impossible! :-(
— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12073#issuecomment-172480637.
@griesemer I think this is the CL you asked me to send: https://go-review.googlesource.com/18891, sorry that I'm not familiar with the review system and it took me some time.
And this is a very preliminary implementation, and I understand that it's not good enough, but at least it shows my thought. Thank you!
CL https://golang.org/cl/18891 mentions this issue.
Change https://golang.org/cl/202257 mentions this issue: text/tabwriter: add ANSI Graphics Rendition format
I need this to work with ANSI colors. Would a PR be accepted? Could we have a new flag similar to FilterHTML
but for terminal colors?
I just found https://github.com/juju/ansiterm, I believe I can give it a try instead.
text/tabwriter has been marked frozen since CL 31910 (2016).
What version of Go are you using (go version)?
1.4.2
What operating system and processor architecture are you using?
Linux / AMD64
What did you do?
What did you expect to see?
Where the first foo is red.
What did you see instead?
Where the first foo is red due to non-printable ANSI escape sequences being included in cell width calculations.
Proposal:
I realize that printing to the terminal may not be
text/tabwriter
s intended purpose, but I think it would be nice to be able to configure how the width of a cell is calculated as I can imagine further cases (similar to the special casing of HTML tags and entities currently) that would also benefit from this.Suggestion: Exposing an additional field on the
tabwriter.Writer
struct so as to maintain the function signature ofInit
andNewWriter
:If this function was
nil
, the existing width calculation could be used, but otherwise this function could be called with the contents of the cell.I can take a stab at the implementation of this, but I first wanted to see if such a change would be welcome.