MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #17697] Vertical alignment problem with non-ASCII characters #6871

Closed MichaelChirico closed 4 years ago

MichaelChirico commented 4 years ago

Originally filed with RStudio but Ron Blum helpfully pointed out this affects the R GUI too (at least on Mac, unconfirmed on other OS)

https://github.com/rstudio/rstudio/issues/5992

Reproducing the issue here:

x = data.frame(
中文 = c("你好!", "你说什么?", "你做了什么?", "我会做。", "你想我了?", "他们会做。", "他们在说什么?",
"你会做吗?", "你做什么工作?", "她做什么工作?"),
sound_file = c("tmp1cctcn.mp3", "tmp4tzxbu.mp3", "333012.mp3",
"G009-03.mp3", "334429.mp3", "G009-05.mp3", "tmpxth4o3.mp3", "R009-07.mp3",
"R020-01.mp3", "R020-02.mp3"),
media_id = c("7293", "1884", "7596", "7018", "6485", "1643", "4617",
"6624", "4720", "7201")
)
print(x)
#               中文    sound_file media_id
#  1:         你好! tmp1cctcn.mp3     7293
#  2:     你说什么? tmp4tzxbu.mp3     1884
#  3:   你做了什么?    333012.mp3     7596
#  4:       我会做。   G009-03.mp3     7018
#  5:     你想我了?    334429.mp3     6485
#  6:     他们会做。   G009-05.mp3     1643
#  7: 他们在说什么? tmpxth4o3.mp3     4617
#  8:     你会做吗?   R009-07.mp3     6624
#  9: 你做什么工作?   R020-01.mp3     4720
# 10: 她做什么工作?   R020-02.mp3     7201

The vertical alignment of the printed output is all out of whack. Possibly related to:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17625 https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16186

But neither of those mentions the GUI specifically.

We can also get the issue with using matrix() only (i.e. it's not a data.frame printing issue, probably the root is in matrix formatting)

matrix(sample(x$中文, 40L, TRUE), 10, 4)
#       [,1]             [,2]             [,3]             [,4]            
#  [1,] "她做什么工作?" "他们在说什么?" "你好!"         "我会做。"      
#  [2,] "我会做。"       "他们在说什么?" "你想我了?"     "你做了什么?"  
#  [3,] "你会做吗?"     "他们会做。"     "你想我了?"     "你做了什么?"  
#  [4,] "你做了什么?"   "他们在说什么?" "你好!"         "你做了什么?"  
#  [5,] "你做什么工作?" "我会做。"       "你做什么工作?" "我会做。"      
#  [6,] "你想我了?"     "他们在说什么?" "他们在说什么?" "她做什么工作?"
#  [7,] "你好!"         "你做了什么?"   "你做什么工作?" "我会做。"      
#  [8,] "他们会做。"     "你好!"         "你好!"         "我会做。"      
#  [9,] "她做什么工作?" "你想我了?"     "你做什么工作?" "她做什么工作?"
# [10,] "你做了什么?"   "我会做。"       "你做了什么?"   "他们在说什么?"

METADATA

MichaelChirico commented 4 years ago

I can reproduce this on Windows in RGui running in Chinese locale, on Linux the output is fine. The issue is caused by the fonts; with many fonts, the ideographic full stop symbol ("\u3002") is printed as too narrow, which includes the default font in RGui on my system. This is not a problem of how R formats the matrices when printing - the number of characters emitted is correct, which can be seen when the output is pasted to another application with properly fixed-width fonts. On my system, switching RGui to NSimSun font (a Simplified Chinese font) solves the problem.

For reference, a smaller example to reproduce:

matrix(nrow=2, c("做", "。", "x", "x"))


METADATA

MichaelChirico commented 4 years ago

Thanks for investigating Thomas, I can also solve this by changing fonts.


METADATA