Closed NaserMonsefi closed 5 years ago
This report seems to be similar to #228 -- are you on Windows? Can you please share your devtools::session_info()
? And also the data
object eg via dput
.
Thanks a lot for coming back to me so quick, here is the sessioninfo: I am afraid that dput will mess up the unicodes, I uploaded the RDS file here: https://www.dropbox.com/s/t1u20gybxirrmt1/data_utf8.RDS?dl=0 Hopefully this works,
Yours, Naser
Thanks for the details! Runnig here works OK:
Although I'm on Linux and using UTF-8 locale. Can you pls also try to set the locale to UTF-8? pander
doesn't do any specific character encoding updates, so I suspect this issue is rather due to the local config. Eg what if you update the Encoding
of the object? Any help is highly appreciated here, I don't have access to Windows on a regular basis.
You are absolutely correct, seems to be a windows problem. It worked on my linux vbox. Neither of English locale worked either (although they supposed to be utf8) Guess, for windows i might change encoding of the data to native(latin1) before using pander. Yours, Naser
I think I found the cause for the problem,
So if I use to change Encoding
like this, it gave the same wrong format for UTF-8 (β) (forcing encodign to latin1 that is native):
but if I use enc2native
function instead, it doesn't make the weird character and all characters are in the latin1 (ß) form.
But my guess would be that somehow pander uses enc2native
for the data in the matrix but uses Encoding
for row and col names to transfer to native, creating the incorrect characters.
This will sort of work, meaning that seems you can not get UTF-8 characters in windows for pander but still can change them to native and then use pander.
Yours, Naser
Might be related to some internal Rcpp stuff, but AFAIK we pass all headers + table body to the same functions. cc @RomanTsegelskyi for confirmation
BTW can you please let me know, @NaserMonsefi, how you created this data.frame
? This Windows behaviour (like in #228) to have different encoding for table header and content really freaks me out.
I originally noticed the problem, importing a data set using read.delim
read.delim('..data.csv', sep = ',', stringsAsFactors = FALSE, encoding = 'UTF-8', check.names = F)
The files is encoded in UTF-8 and have header names with the UTF-8 beta in it. Of course if i use check.names = T it will encode to "unknown" with more wrong characters. I think I found a solution for my case as mentioned above, but don't know what is causing it on the OS level. Yours, Naser
I have been having the same problem, also on Windows. Thanks Naser, enc2native
also worked for me.
I tested #326 in a Windows VM started and seems to do the trick, but please confirm.
Should be fixed with the above commit.
@daroczig, I just had the same issue. Is there a way that I could help in some way to release a new version of pander with this fix (and all others that have been made)?
@billdenney you mean a CRAN release? I will need to look into the CI builder as seems to be failing and do a general check-up on the package ... I have not really touched it for a while. I can do that in a few weeks hopefully, but would appreciate any help someone running all the tests and R CMD check
using dev version of R etc and create a PR for a CRAN release.
Hi,
I was using pander with a matrix containing UTF-8 col names and released that pander can not recognise them. I dig a little deeper and noticed that actually pander have no problem with UTF-8 characters anywhere else beside row or col names. Further, I noticed that pander encodes them from UTF-8 to latin1 but for some reason this doesn't happen for row or col names. I made a small matrix to test this and it looks like this:
The encoding for this data shows that the first two are UTF-8 (β) with longer tail on beta and the two others are latin1 (ß) with chopped beta tail. This is true for the rownames and colnames as well.
Now if it is passed to pander it looks as follow:
First pander encoded all the UTF-8 (β) in the matrix to latin1 (ß) and printed them. But for some reason this doesn't happen for row and col names. Pander was only able to print the latin1 (ß) correctly in rows and cols. My question is first, how can I make sure that pander actually print UTF-8 in the row and col as well? Also it is preferred if it actually pass them as UTF-8 not as latin1 in the matrix and for rows and cols.
Thanks, Naser