Open mahrud opened 4 years ago
The Agda input method, available as part of the agda package in homebrew, allows you to type \theta
to get θ
. If I use that with M2 I can easily use that in commands:
i1 : R = QQ[θ]
o1 = R
o1 : PolynomialRing
i2 : θ^6
6
o2 = θ
o2 : R
i3 : θ + 1
o3 = θ + 1
o3 : R
The standard emacs function insert-char
, available on the key sequence C-x 8 RET
, allows you to type the unicode name of any unicode character to get it. These are the ones involving "theta" in the name:
GREEK CAPITAL LETTER THETA (Θ)
GREEK CAPITAL THETA SYMBOL (ϴ)
GREEK SMALL LETTER SCRIPT THETA (ϑ)
GREEK SMALL LETTER THETA (θ)
GREEK THETA SYMBOL (ϑ)
MATHEMATICAL BOLD CAPITAL THETA (𝚯)
MATHEMATICAL BOLD CAPITAL THETA SYMBOL (𝚹)
MATHEMATICAL BOLD ITALIC CAPITAL THETA (𝜣)
MATHEMATICAL BOLD ITALIC CAPITAL THETA SYMBOL (𝜭)
MATHEMATICAL BOLD ITALIC SMALL THETA (𝜽)
MATHEMATICAL BOLD ITALIC THETA SYMBOL (𝝑)
MATHEMATICAL BOLD SMALL THETA (𝛉)
MATHEMATICAL BOLD THETA SYMBOL (𝛝)
MATHEMATICAL ITALIC CAPITAL THETA (𝛩)
MATHEMATICAL ITALIC CAPITAL THETA SYMBOL (𝛳)
MATHEMATICAL ITALIC SMALL THETA (𝜃)
MATHEMATICAL ITALIC THETA SYMBOL (𝜗)
MATHEMATICAL SANS-SERIF BOLD CAPITAL THETA (𝝝)
MATHEMATICAL SANS-SERIF BOLD CAPITAL THETA SYMBOL (𝝧)
MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL THETA (𝞗)
MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL THETA SYMBOL (𝞡)
MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL THETA (𝞱)
MATHEMATICAL SANS-SERIF BOLD ITALIC THETA SYMBOL (𝟅)
MATHEMATICAL SANS-SERIF BOLD SMALL THETA (𝝷)
MATHEMATICAL SANS-SERIF BOLD THETA SYMBOL (𝞋)
MODIFIER LETTER SMALL THETA (ᶿ)GREEK CAPITAL LETTER THETA (Θ)
GREEK CAPITAL THETA SYMBOL (ϴ)
GREEK SMALL LETTER SCRIPT THETA (ϑ)
GREEK SMALL LETTER THETA (θ)
GREEK THETA SYMBOL (ϑ)
MATHEMATICAL BOLD CAPITAL THETA (𝚯)
MATHEMATICAL BOLD CAPITAL THETA SYMBOL (𝚹)
MATHEMATICAL BOLD ITALIC CAPITAL THETA (𝜣)
MATHEMATICAL BOLD ITALIC CAPITAL THETA SYMBOL (𝜭)
MATHEMATICAL BOLD ITALIC SMALL THETA (𝜽)
MATHEMATICAL BOLD ITALIC THETA SYMBOL (𝝑)
MATHEMATICAL BOLD SMALL THETA (𝛉)
MATHEMATICAL BOLD THETA SYMBOL (𝛝)
MATHEMATICAL ITALIC CAPITAL THETA (𝛩)
MATHEMATICAL ITALIC CAPITAL THETA SYMBOL (𝛳)
MATHEMATICAL ITALIC SMALL THETA (𝜃)
MATHEMATICAL ITALIC THETA SYMBOL (𝜗)
MATHEMATICAL SANS-SERIF BOLD CAPITAL THETA (𝝝)
MATHEMATICAL SANS-SERIF BOLD CAPITAL THETA SYMBOL (𝝧)
MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL THETA (𝞗)
MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL THETA SYMBOL (𝞡)
MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL THETA (𝞱)
MATHEMATICAL SANS-SERIF BOLD ITALIC THETA SYMBOL (𝟅)
MATHEMATICAL SANS-SERIF BOLD SMALL THETA (𝝷)
MATHEMATICAL SANS-SERIF BOLD THETA SYMBOL (𝞋)
MODIFIER LETTER SMALL THETA (ᶿ)
I'll look into the Agda input method!
Could M2 convert θ
to $\theta$
when outputting html?
That could be made to happen, but why bother? Browsers can display unicode, and $\theta$ will be displayed as is.
there are still issues with utf8 support I believe. this is apparent already in the improper formatting of theta^6 above. a related issue:
i1 : width "aξbc"
o1 = 5
i2 : length "aξb"
o2 = 4
For proper formatting, we need to know the width of the unicode character when it gets displayed. It is not always 1. Here is an example where they are about 1.7 characters wide:
i24 : "你好你好你好你好你好"
+++++++++++++++++
A proper solution would involve interrogating the display to discover the width, but that would take time.
A work-around for determining the number of unicode characters in a string is this:
i25 : # utf8 "你好你好你好你好你好"
o25 = 10
It's an O(N) algorithm, so it's not so bad, but who needs that number for anything?
Here's the way it looks in emacs:
The fact remains, at the moment the formatting is based on width (or length, I forget). it will go wrong:
i1 : R=QQ[a,ξ]
o1 = R
o1 : PolynomialRing
i2 : I=ideal(a^2+ξ^2,a+31)
2 2
o2 = ideal (a + ξ , a + 31)
o2 : Ideal of R
i3 : netList I_*
+--------+
| 2 2|
o3 = |a + ξ |
+--------+
|a + 31 |
+--------+
In a fixed-width font, shouldn't one be able to fix that satisfactorily?
Is your proposal to guess that every unicode character has a width of 1 on the screen? If so, that might be a good stop-gap measure until we can determine the width of all the characters accurately on all output devices, for at least it would work for some of them.
One way to do that would be change the routine "netWidth" in d/actors5.d
to return the maximum number of utf8 characters in the rows. A new routine for computing the number of utf8 characters in a string or could be modeled after the routine "utf8(y:Expr):Expr" in d/actors4.d
, which converts a string (sequence of bytes) to a list of integers representing the unicode points in the string.
At top level, we would distinguish more between objects of class String, which would be regarded still as sequences of bytes, and objects of class Net, which are destined to be displayed on the screen, even though currently String is a type of Net in the hierarchy. Various spots in the top level formatting code that expect the width of a string to equal the width of the net that would result from it would have to be fixed to get formatting to work again.
But, what is the long-term solution? Here is an experiment that shows there may be none:
These are screen shots showing two states of the same emacs buffer before and after running M-x text-scale-adjust
. The ratio between Chinese character widths and Roman character widths is not a constant independent of size. So asking the display for the ratio may be fruitless.
the situation with asian characters does seem complicated. first, it's not clear to me monospace fonts exist, and second, there seems to exist a distinction between narrow and wide characters. however for characters such as greek characters (which occur more frequently in math), the situation is much simpler, they take exactly 1 space in monospace fonts.
It's the same for Russian:
on my to-do list.
What are you intending to do? Shall we assign you to this issue or make a new one?
You suggested to modify netWidth
in d/actors5.d
. I'd like to give it a try (unless someone else volunteers!). Yes, you can assign me to this issue.
Okay. This is the sort of thing that requires testing, so I suggest making all the documentation for all the packages and running all the tests.
And thanks!
I just noticed this:
i1 : width "A\tB"
o1 = 3
i2 : << "A\tB";
A B
i3 : width net "A\tB"
o3 = 9
A related issue with \t
is the fact that somewhere in the d code its width is hardcoded due to the following lines in stdiop.d:
else if c == int('\t') then (
o.column = ushort(((int(o.column)+8)/8)*8);
)
I'd very much like to remove those lines since it messes up code positioning; emacs somehow magically fixes it on the fly, as explained to me by @d-torrance so removing these lines would require doing something at the level of emacs:
Ah, just read some source code and found a nice solution. There's a variable in Emacs exactly for this sort of thing! I think if we set compilation-error-screen-columns to nil in M2-comint-mode, it should work.
(again quoting @d-torrance)
Is there any interest in using utf8 (or even unicode) characters more in Emacs? Recent versions of Emacs as well as most terminal emulators support unicode (not sure since which version). M2 itself also supports utf8 characters for variable names:
Entering them is not super easy, but for instance
S_0
is a an easy shortcut. I don't think allowing\theta
as variable name would be possible.As an example of an alternative way, some Sage functions have a
latex_name
option, which just tells Sage to print the LaTeX name of the variable or function when usinglatex(variable)
. Example:We could have a similar option so that on viewers supporting utf8 we can print the utf8 character and on viewers supporting LaTeX, for instance in the documentation or the interactive shell, use the LaTeX code and let MathJax or KaTeX do the conversion.
This is somewhat related to #522