cpsievert / LDAvis

R package for web-based interactive topic model visualization.
Other
557 stars 131 forks source link

Added encoding argument to serVis() #73

Closed davidfuhry closed 7 years ago

davidfuhry commented 7 years ago

When on windows r does not encode created files as utf-8 by default (see this stackoverflow post). This is rather annoying when working with texts that are not in english e.g. german where there are a lot of umlaute ('ö', 'ä' etc.) To work around this i've added the use.bytes argument which will make serVis use writeLines with useBytes = TRUE if set. This works around the problem and encodes the json file in utf-8.

I've added documentation for this argument in man/serVis.R but as i'm fairly new to r librarys i might have missed something.

dselivanov commented 7 years ago

Would like to add that this PR fixes #50. Probably more straightforward solution will be:

con = file(file.path(out.dir, "lda.json"), encoding = "UTF-8")
on.exit(close.connection(con))
cat(json, file = con)
davidfuhry commented 7 years ago

@dselivanov You're right, also seems to be the more consistent way of doing it. I've updated the code. Also I rolled back the changes that I made to the documentation as I'm an idiot and missed that it was auto generated.

cpsievert commented 7 years ago

looks good to me, @kshirley?

kshirley commented 7 years ago

Yep, looks good to me, too! Thanks @davidfuhry and @dselivanov .

cpsievert commented 7 years ago

A few things before we merge:

davidfuhry commented 7 years ago

@cpsievert I've updated the Version and the NEWS and renamed the argument to utf8.

As for making it default I couldn't think of a reason why it would be bad (after all the generated html is set to utf-8 encoding anyways) tough the issue itself is pretty specific to windows so I'm not sure if that would have any negative side effects when running on linux or os x.

cpsievert commented 7 years ago

Hmm, now I'm not totally sure we need to add an explicit argument for this. Couldn't #50 be fixed (without these changes) by doing:

options(encoding = "UTF-8")

Before calling serVis() ?

davidfuhry commented 7 years ago

Just tested it, that actually fixes the issue.

I guess that makes these changes obsolete, or would there be any advantage (apart from independence from the global configuration, if that's an advantage) in having a seperate argument?

cpsievert commented 7 years ago

It would be more transparent to have, say, an encoding argument which is passed to file(), but I'm leaning towards just educating users about that options() workaround.

dselivanov commented 7 years ago

There is. Usually global option is not set to utf8. For example in text2vec I got a lot of bug reports related to this issue.

13 июн. 2017 г. 11:24 ПП пользователь "David Fuhry" < notifications@github.com> написал:

Just tested it, that actually fixes the issue.

I guess that makes these changes obsolete, or would there be any advantage (apart from independence from the global configuration, if that's an advantage) in having a seperate argument?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cpsievert/LDAvis/pull/73#issuecomment-308221875, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3cJViimYwq7AUgCTErtxvof_i3WMks5sDuH0gaJpZM4NpOPu .

davidfuhry commented 7 years ago

@dselivanov You're properbly right again, I guess there is a reason why the file function has an encoding argument in the first place and doesn't just allways use the global default.

@cpsievert I've changed the argument to work pretty much like the encoding argument of the file function. This is the most streamline version there is, i guess.

cpsievert commented 7 years ago

Thanks @davidfuhry, looks to good to me, ok with you @kshirley?

kshirley commented 7 years ago

Yep, looks good to me!

dselivanov commented 7 years ago

what is the drawback of just setting it to "utf8" by default?

cpsievert commented 7 years ago

what is the drawback of just setting it to "utf8" by default?

Backwards incompatibility. It could potentially break for others using a different encoding.

dselivanov commented 7 years ago

Ok, thanks, make sense.

14 июн. 2017 г. 7:01 ПП пользователь "Carson Sievert" < notifications@github.com> написал:

what is the drawback of just setting it to "utf8" by default?

Backwards incompatibility. It could potentially break for others using a different encoding.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cpsievert/LDAvis/pull/73#issuecomment-308459437, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3fsVQxMGBDKki-V5duKRYTUxI72-ks5sD_XRgaJpZM4NpOPu .