Open alashworth opened 5 years ago
Comment by ariddell Tuesday May 10, 2016 at 18:32 GMT
Python 2 handles UTF-8 fine. (It's just not the default representation.)
Comment by tpapp Wednesday May 11, 2016 at 06:36 GMT
@ariddell: Can you provide a link with an example/description? Then I could update the list.
Comment by bgoodri Wednesday May 11, 2016 at 15:57 GMT
RStudio should work https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding
Comment by ariddell Wednesday May 11, 2016 at 16:52 GMT
@tpapp https://docs.python.org/2/howto/unicode.html Python 3 uses Unicode by default. In Python 2 you need to be explicit about it. In both cases you can have unicode strings in source code.
Comment by tpapp Wednesday May 11, 2016 at 17:10 GMT
@ariddell: in the page you link I could not find an example with unicode identifiers (only strings, literals, filenames, etc).
Comment by ariddell Wednesday May 11, 2016 at 17:24 GMT
Here's a link to the section: https://docs.python.org/2/howto/unicode.html#unicode-literals-in-python-source-code
Comment by bob-carpenter Wednesday May 11, 2016 at 19:34 GMT
Those are unicode literals, not unicode identifiers.
Can you have
éø = 10
where you assign to a unicode literal? Or dictionaries with unicode keys?
On May 11, 2016, at 1:24 PM, Allen Riddell notifications@github.com wrote:
Here's a link to the exact section: https://docs.python.org/2/howto/unicode.html#unicode-literals-in-python-source-code
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
Comment by ariddell Wednesday May 11, 2016 at 20:07 GMT
You're right. In Python 2.7 you can't have unicode variables. In Python 3 you can. But why does that matter? We only need unicode in the Stan program code. (Parameter lookups aren't affected since keys are and always were strings.)
Comment by bob-carpenter Wednesday May 11, 2016 at 20:48 GMT
It looks like you use a dictionary structure for variable names.
schools_dat = { 'J': 8, 'y': [28, 8, -3, 7, -1, 1, 18, 12], 'sigma': [15, 10, 16, 11, 9, 11, 10, 18] }
Can the keys be unicode?
RStan can read data values out of the environment if they're named after variables in the Stan program. And it can attach the resulting draws as variables in the environment.
On May 11, 2016, at 4:07 PM, Allen Riddell notifications@github.com wrote:
You're right. In Python 2.7 you can't have unicode variables. In Python 3 you can. But why does that matter? We only need unicode in the Stan program code. (Parameter lookups aren't affected since keys are and always were strings.)
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
Comment by ariddell Thursday May 12, 2016 at 12:20 GMT
Python 2 has no problems with unicode dictionary keys. In fact, it can have unicode variables in the environment but you have to reference them via strings indirectly. For example, this works in Python 2:
>>> locals()[u'é'] = 9
>>> locals()[u'é']
(locals
is something like baseenv
or .GlobalEnv
in R)
Bref, there is nothing Python 2 can't do that's relevant to supporting unicode in Stan code. The table above is inaccurate.
On 05/11, Bob Carpenter wrote:
It looks like you use a dictionary structure for variable names.
schools_dat = { 'J': 8, 'y': [28, 8, -3, 7, -1, 1, 18, 12], 'sigma': [15, 10, 16, 11, 9, 11, 10, 18] }
Can the keys be unicode?
RStan can read data values out of the environment if they're named after variables in the Stan program. And it can attach the resulting draws as variables in the environment.
- Bob
On May 11, 2016, at 4:07 PM, Allen Riddell notifications@github.com wrote:
You're right. In Python 2.7 you can't have unicode variables. In Python 3 you can. But why does that matter? We only need unicode in the Stan program code. (Parameter lookups aren't affected since keys are and always were strings.)
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/stan-dev/stanc3/issues/1406
Comment by bob-carpenter Thursday May 12, 2016 at 15:33 GMT
On May 12, 2016, at 8:20 AM, Allen Riddell notifications@github.com wrote:
Python 2 has no problems with unicode dictionary keys. In fact, it can have unicode variables in the environment but you have to reference them via strings indirectly. For example, this works in Python 2:
>>> locals()[u'é'] = 9 >>> locals()[u'é']
(
locals
is something likebaseenv
or.GlobalEnv
in R)Bref, there is nothing Python 2 can't do that's relevant to supporting unicode in Stan code. The table above is inaccurate.
You should have edit permission on the issues.
Comment by ariddell Thursday May 12, 2016 at 16:18 GMT
I was just recording my thought on the matter. I appreciate @tpapp putting work into drafting the issue text and would prefer to leave any edits to him.
Comment by tpapp Friday May 13, 2016 at 06:54 GMT
@ariddell: The table was accurate, but since not all Stan interfaces work the way that R/Julia does, I extended it with the information that is probably most relevant: whether the interfaces, in the way they currently operate, would support UTF8 variables for (1) passing data to Stan and (2) extracting MCMC results. Thanks for pointing this out, this is much more important than the details of UTF8 support in those languages per se.
Not being a STATA user, I am reluctant to make a definitive statement about it. If someone could help with that it would be great.
Comment by ariddell Saturday May 14, 2016 at 23:28 GMT
One use of unicode in Stan Program code which should definitely be supported is in comments. Leaving code comments in one's native language is fairly routine in Python/Java/etc. We should at least support that in Stan.
Comment by bob-carpenter Saturday May 14, 2016 at 23:51 GMT
Unicode in comments is OK now.
On May 14, 2016, at 7:28 PM, Allen Riddell notifications@github.com wrote:
One use of unicode in Stan Program code which should definitely be supported is in comments. Leaving code comments in one's native language is fairly routine in Python/Java/etc. We should at least support that in Stan.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub
Comment by tpapp Sunday May 15, 2016 at 07:02 GMT
Indeed UTF8 comments work fine, and I have been using them for a while. Made a clarification in the issue.
Comment by ariddell Sunday May 15, 2016 at 14:16 GMT
UTF8 comments aren't supported in PyStan right now (non-ASCII characters will generate an error). I'll fix this. stan-dev/pystan#223
Issue by tpapp Monday May 09, 2016 at 17:29 GMT Originally opened as https://github.com/stan-dev/stan/issues/1888
Introduction
Some languages now support Unicode (mostly UTF8) for writing source code. It would be great if one could also use Unicode in Stan source. (Note that comments in UTF8, or any superset that embeds ASCII, are already supported in the sense the parser just ignores them.)
Broadly, there are two possible levels of support:
ϕ
), and≤
), which provide synonyms for existing ones (eg<=
)Example
This is how the 8 schools example would look like in unicode:
Possible benefits
Possible downsides
The first two are mitigated by the fact that ASCII is a subset of UTF8, so using the feature is optional.
UTF8 support in various languages which have interfaces for Stan
Editor support
Emacs
See this list for various UTF8 implementations using autocomplete, company-mode, and quail.
See also