TI-Toolkit / tokens

TI-BASIC token information XMLs for inclusion in other projects
8 stars 0 forks source link

Defining correspondences to on-calculator display #12

Closed tari closed 1 year ago

tari commented 1 year ago

In one of my projects (not currently published anywhere), I've wanted to define canonical token representations that use the full range of Unicode to display text as closely as possible to how it appears on a calculator while also preserving semantics.

I've been working with a tokens XML file which usefully specifies <alt> strings for tokens so it's easy to treat the non-alternate string for each token as canonical, and I've developed this mapping from Unicode strings to calculator character set based on the combination of token semantics and actual use of characters on monochrome calculators:

    ("\0", 0x00),
    ("𝑛", 0x01), // U+1D45B mathematical italic small N
    ("𝗎", 0x02), // U+1D5CE mathematical sans-serif small U
    ("𝗏", 0x03), // U+1D5CF mathematical sans-serif small V
    /* preceding U and V are the same as regular ASCII in both the large and small fonts,
    but this W is half-width in the small font */
    ("𝗐", 0x04),   // U+1D5D0 mathematical sans-serif small W
    ("►", 0x05),    // U+25BA black right-pointing pointer
    ("🠽", 0x06),   // U+1F83D upwards compressed arrow
    ("🠿", 0x07),   // U+1F83F downwards compressed arrow
    ("∫", 0x08),    // U+222B integral
    ("×", 0x09),     // U+00D7 multiplication sign
    ("□", 0x0A),    // U+25A1 white square
    ("﹢", 0x0B),    // U+FE62 small plus sign
    ("·", 0x0C),     // U+00B7 middle dot
    ("ᴛ", 0x0D),    // U+1D1B latin letter small capital T
    ("³", 0x0E),     // U+00B3 superscript three
    ("𝔽", 0x0F),   // U+1D53D mathematical double-struck capital F (seems unused by any tokens)
    ("√", 0x10),    // U+221A square root
    ("⁻¹", 0x11),  // U+207B superscript minus, U+00B9 superscript one
    ("²", 0x12),     // U+00B2 superscript two
    ("∠", 0x13),    // U+2220 angle
    ("°", 0x14),     // U+00B0 degree sign
    ("ʳ", 0x15),     // U+02B3 modifier letter small R
    ("ᵀ", 0x16),    // U+1D40 modifier letter capital T
    ("≤", 0x17),    // U+2264 less-than or equal to
    ("≠", 0x18),    // U+2260 not equal to
    ("≥", 0x19),    // U+2265 greater-than or equal to
    ("⁻", 0x1A),    // U+207B superscript minus
    ("ᴇ", 0x1B),    // U+1D07 latin letter small capital E
    ("→", 0x1C),    // U+2192 rightwards arrow
    ("₁₀", 0x1D), // U+2081 subscript one, U+2080 subscript zero
    ("↑", 0x1E),    // U+2191 upwards arrow
    ("↓", 0x1F),    // U+2193 downwards arrow
    (" ", 0x20),
    ("!", 0x21),
    ("\"", 0x22),
    ("#", 0x23),
    ("⁴", 0x24), // U+2074 superscript four
    ("%", 0x25),
    ("&", 0x26),
    ("'", 0x27),
    ("(", 0x28),
    (")", 0x29),
    ("*", 0x2A),
    ("+", 0x2B),
    (",", 0x2C),
    ("-", 0x2D),
    (".", 0x2E),
    ("/", 0x2F),
    ("0", 0x30),
    ("1", 0x31),
    ("2", 0x32),
    ("3", 0x33),
    ("4", 0x34),
    ("5", 0x35),
    ("6", 0x36),
    ("7", 0x37),
    ("8", 0x38),
    ("9", 0x39),
    (":", 0x3A),
    (";", 0x3B),
    ("<", 0x3C),
    ("=", 0x3D),
    (">", 0x3E),
    ("?", 0x3F),
    ("@", 0x40),
    ("A", 0x41),
    ("B", 0x42),
    ("C", 0x43),
    ("D", 0x44),
    ("E", 0x45),
    ("F", 0x46),
    ("G", 0x47),
    ("H", 0x48),
    ("I", 0x49),
    ("J", 0x4A),
    ("K", 0x4B),
    ("L", 0x4C),
    ("M", 0x4D),
    ("N", 0x4E),
    ("O", 0x4F),
    ("P", 0x50),
    ("Q", 0x51),
    ("R", 0x52),
    ("S", 0x53),
    ("T", 0x54),
    ("U", 0x55),
    ("V", 0x56),
    ("W", 0x57),
    ("X", 0x58),
    ("Y", 0x59),
    ("Z", 0x5A),
    ("θ", 0x5B), // U+03B8 greek small letter theta
    ("\\", 0x5C),
    ("]", 0x5D),
    ("^", 0x5E),
    ("_", 0x5F),
    ("‛", 0x60), // U+201B single high-reversed-9 quotation mark
    ("a", 0x61),
    ("b", 0x62),
    ("c", 0x63),
    ("d", 0x64),
    ("e", 0x65),
    ("f", 0x66),
    ("g", 0x67),
    ("h", 0x68),
    ("i", 0x69),
    ("j", 0x6a),
    ("k", 0x6b),
    ("l", 0x6c),
    ("m", 0x6d),
    ("n", 0x6e),
    ("o", 0x6f),
    ("p", 0x70),
    ("q", 0x71),
    ("r", 0x72),
    ("s", 0x73),
    ("t", 0x74),
    ("u", 0x75),
    ("v", 0x76),
    ("w", 0x77),
    ("x", 0x78),
    ("y", 0x79),
    ("z", 0x7a),
    ("{", 0x7b),
    ("|", 0x7c),
    ("}", 0x7d),
    ("~", 0x7e),
    ("⍯", 0x7f), // U+236F APL functional symbol quad not equal (never appears in tokens)
    ("₀", 0x80), // U+2080 subscript zero
    ("₁", 0x81),
    ("₂", 0x82),
    ("₃", 0x83),
    ("₄", 0x84),
    ("₅", 0x85),
    ("₆", 0x86),
    ("₇", 0x87),
    ("₈", 0x88),
    ("₉", 0x89), // .. U+2089 subscript nine
    ("Á", 0x8a),  // U+00C1 latin capital letter A with acute
    ("À", 0x8b),  // U+00C0 latin capital letter A with grave
    ("Â", 0x8c),  // U+00C2 latin capital letter A with circumflex
    ("Ä", 0x8d),  // U+00C4 latin capital letter A with diaeresis
    ("á", 0x8e),  // .. latin-1 supplement continues
    ("à", 0x8f),
    ("â", 0x90),
    ("ä", 0x91),
    ("É", 0x92),
    ("È", 0x93),
    ("Ê", 0x94),
    ("Ë", 0x95),
    ("é", 0x96),
    ("è", 0x97),
    ("ê", 0x98),
    ("ë", 0x99),
    ("Í", 0x9a),
    ("Ì", 0x9b),
    ("Î", 0x9c),
    ("Ï", 0x9d),
    ("í", 0x9e),
    ("ì", 0x9f),
    ("î", 0xa0),
    ("ï", 0xa1),
    ("Ó", 0xa2),
    ("Ò", 0xa3),
    ("Ô", 0xa4),
    ("Ö", 0xa5),
    ("ó", 0xa6),
    ("ò", 0xa7),
    ("ô", 0xa8),
    ("ö", 0xa9),
    ("Ú", 0xaa),
    ("Ù", 0xab),
    ("Û", 0xac),
    ("Ü", 0xad),
    ("ú", 0xae),
    ("ù", 0xaf),
    ("û", 0xb0),
    ("ü", 0xb1),
    ("Ç", 0xb2),
    ("ç", 0xb3),
    ("Ñ", 0xb4),
    ("ñ", 0xb5),
    ("´", 0xb6), // U+00B4 acute accent
    ("`", 0xb7),  // U+0060 grave accent
    ("¨", 0xb8),
    ("¿", 0xb9),
    ("¡", 0xba),
    ("α", 0xbb), // U+03B1 Greek small letter alpha
    ("β", 0xbc), // U+03B2 Greek small letter beta
    ("γ", 0xbd), // U+03B3 Greek small letter gamma
    ("Δ", 0xbe), // U+0394 Greek capital letter delta
    ("δ", 0xbf), // U+03B4 Greek small letter delta
    ("ε", 0xc0), // U+03B5 Greek small letter epsilon
    ("[", 0xc1),
    ("λ", 0xc2),  // U+03BB Greek small letter lambda
    ("μ", 0xc3),  // U+03BC Greek small letter mu
    ("π", 0xc4),  // U+03C0 Greek small letter pi
    ("ρ", 0xc5),  // U+03C1 Greek small letter rho
    ("Σ", 0xc6),  // U+03A3 Greek capital letter sigma
    ("σ", 0xc7),  // U+03C3 Greek small letter sigma
    ("τ", 0xc8),  // U+03C4 Greek small letter tau
    ("Φ", 0xc9),  // U+03A6 Greek capital letter phi
    ("Ω", 0xca),  // U+03A9 Greek capital letter omega
    ("x̄", 0xcb), // x, U+0305 combining overline
    ("ȳ", 0xcc),  // U+0233 latin small letter Y with macron
    ("ˣ", 0xcd),  // U+02E3 modifier letter small X
    ("…", 0xce), // U+2026 horizontal ellipsis
    ("◄", 0xcf), // U+25C4 black left-pointing pointer
    ("■", 0xd0), // U+25A0 black square (unused by any token?)
    ("∕", 0xd1), // U+2215 division slash (unused by any token?)
    ("‐", 0xd2), // U+2010 hyphen (unused by any token)
    /* 0xd3 is a superscript two, but not exactly the same as 0x12 in the large font.
     * Doesn't seem to be used in any tokens; appears used for displaying computed area. */
    ("\u{f83d3}", 0xd3),
    ("˚", 0xd4), /* U+02DA ring above. Used to represent temperature, rather than angular
                   * degrees like 0x14. Looks similar, but smaller. */
    /* 0xd5 is a superscript three, same as 0x03. Possibly used for area and volume like 0xd3,
     * but never used in tokens. */
    ("\u{f83d5}", 0xd5),
    /* 0xd6 unused; unallocated in small font */
    ("\u{f83d6}", 0xd6),
    ("𝑖", 0xd7), // U+1D456 mathematical italic small I
    ("p̂", 0xd8),  // p, U+0302 combining circumflex accent
    ("χ", 0xd9),   // U+03C7 greek small letter chi
    ("𝙵", 0xda), // U+1D675 mathematical monospace capital F
    ("𝑒", 0xdb), // U+1D452 mathematical italic small E (Euler's number)
    ("ʟ", 0xdc),   // U+029F latin letter small capital L (list name prefix)
    ("𝗡", 0xdd), // U+1D5E1 mathematical sans-serif bold capital N
    ("⸩", 0xde),  // U+2E29 right double parenthesis
    ("⮕", 0xdf),  // U+2B95 rightwards black arrow
    // e0 through ee are used as cursors and don't appear in tokens
    ("█", 0xe0), // U+2588 full block
    ("\u{f83e1}", 0xe1),
    ("\u{f83e2}", 0xe2),
    ("\u{f83e3}", 0xe3),
    ("\u{f83e4}", 0xe4),
    ("\u{f83e5}", 0xe5),
    ("\u{f83e6}", 0xe6),
    ("\u{f83e7}", 0xe7),
    ("╲", 0xe8), // U+2572 box drawings light diagonal upper left to lower right
    ("\u{f83e9}", 0xe9),
    ("◥", 0xea), // U+25E5 black upper right triangle
    ("◣", 0xeb), // U+25E3 black lower left triangle
    ("⊸", 0xec), // U+22B8 multimap
    ("∘", 0xed), // U+2218 ring operator
    ("⋱", 0xee), // U+22F1 down right diagonal ellipsis
    /* Compare EF and F0 to 06 and 07: the tail on EF/F0 is one pixel longer
     * than on 06/07, so the latter are encoded as compressed arrows. */
    ("🡅", 0xef), // U+1F845 upwards heavy arrow
    ("🡇", 0xf0), // U+1F847 downwards heavy arrow
    ("░", 0xf1),  // U+2591 light shade
    ("$", 0xf2),
    ("🡁", 0xf3), // U+1F841 upwards heavy compressed arrow
    ("ß", 0xf4),   // U+00DF latin small letter sharp S
    ("\u{f83f5}", 0xf5),
    ("⁄", 0xf6), // U+2044 fraction slash (MathPrint)
                   /* remaining bytes are unused and render as a 3x3 pixel filled box */

Although many of the Unicode versions are a single character, a few calculator characters have no direct equivalent in Unicode so are represented with a sequence of characters (such as character 0x1d), and for a few symbols that are used mainly as graphical elements (not appearing in any tokens) I've opted to use a portion of a Unicode private use area at U+F8300..U+F83FF.

The corresponding token definition changes are fairly minor, mostly ensuring that the canonical version of each token uses the semantically-correct Unicode version, like replacing many lower-case ws with U+1D5D0. Here's the diff of my XML to adopt these mappings (though I'm not certain that alone is all the changes I made): tokens.patch.txt

In my application there's also validation that the canonical string for each token can be displayed entirely with the calculator character set, which is fairly simply validation but seems like the most important property of having a canonicalization.


I suggest that this token format should have a way to specify a string version of each token that matches a defined unicode-to-calculator correspondence, and propose the one I've developed as that correspondence.

adriweb commented 1 year ago

Interesting!

I was wondering, hasn't @jacobly0 also established some kind of similar mapping for his font? Trying some of your unicode characters seems to produce the expected glyph anyway :)

tari commented 1 year ago

I did refer to Jacobly's TICELarge while developing this, or at least the code has a comment that refers to it as well as a few other sources (like the large font table from 83pa28d). They differ in some choices mostly because I was looking for the most semantically-appropriate character for less common symbols, which usually means choosing a codepoint in one of the mathematical symbol pages. Use of private-use characters also differs in that I chose to use U+F8300 (because the 83 in that number seems semantically appropriate).

kg583 commented 1 year ago

I'd like to make the following proposal for incorporating all this data in a sensible way, based on earlier discussions on Discord.

The changes to the token sheets would be beneficial if future applications need to be ASCII-only for whatever reason, while the new font sheet would be useful for direct font translation. Var names, for example, are (usually) given by the font rather than by tokens (and don't coincide in some important cases), so tivars_lib_py and others could leverage the sheet directly.

Converting the <name> tags is easy programmatically (just check the bytes), and Tari's file is a gimme to parse for the font XML, though the potential <chars> tag would take at least a bit of manual work (though equal work to deriving canonical Unicode names from the font).

tari commented 1 year ago

The tags should be ordered so that the first of either type is something of a "canonical" choice (though this could also be accomplished by an attribute on/in the tag).

I would prefer to use an attribute to mark the canonical version, because determining ordering post-hoc may be difficult for some consumers (imagine a library that deserializes elements into unordered sets).


we should also add purely ASCII representations where possible

I don't think this is very useful because Unicode normalization (NFKD or NFKC) handles the obvious cases, and most of the other things are impossible to represent with pure ASCII. Normalization probably isn't a good idea either, as Unicode TR15 notes:

some characters with compatibility decompositions are used in mathematical notation to represent a distinction of a semantic nature; replacing the use of distinct character codes by formatting in such contexts may cause problems

Providing a pure-ASCII version of each token seems like the correct option, simply because the semantics of a given character are often dependent on the token it's contained in and providing easy-to-type aliases is important to applications that want to tokenize source code.


Additionally, the sequence of font bytes which give the name on-calc could be added. ... Add font.xml, which contains just the font characters in a simplified format

Only one of these should be used:

Providing all three would potentially allow the unicode version of a token to better capture its semantics even if the calculator character set does not, but also makes it difficult to keep them in sync if we wanted to modify the font mapping (every token using the changed character would also need to be updated).


Putting all these together, I suggest something like this:

(The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119.)

<token>
  <lang code="en">
    <canonical chars="05446563" />
    <alternate>&gt;Dec</alternate>
  </lang>
</token>

Each language's entry for a token MUST contain a canonical element with a chars attribute specifying the calculator characters used to display it, and a canonical token MAY include text to specify a preferred Unicode representation distinct from the Unicode string formed by applying character set mapping to the value of the chars attribute (I expect such alternate preferred encodings to be rare).

Each token MAY have one or more alternates, specifying accepted alternative Unicode representations in the element body. Let the canonical and alternate elements be defined as "textual representation"s of the corresponding token.

There SHOULD exist at least one textual representation for each token containing only printable ASCII characters (U+0020-U+007E). This ensures that most tokens can easily be typed by users without resorting to manual input of uncommon codepoints.

Any textual representation element MUST NOT have the same body text as any other textual representation with the same language code. This ensures that tokens are unambiguous for tokenization applications.

kg583 commented 1 year ago

I would prefer to use an attribute to mark the canonical version, because determining ordering post-hoc may be difficult for some consumers (imagine a library that deserializes elements into unordered sets).

I didn't lead with the suggestion since a) the current sheets don't do this and b) I don't love the idea of us deciding what's canonical, though any (good) choice we make is unlikely to be controversial (particularly if the canonical choice is the Unicode approximation).

I don't think this is very useful because Unicode normalization (NFKD or NFKC) handles the obvious cases

That's fair, I was mostly just wanting to make sure such an option does exist (but any library under the sun will be able to do this, so). Would we then simplify the font XML format even more?

<byte value="$01">𝑛</byte>

Providing all three would potentially allow the unicode version of a token to better capture its semantics even if the calculator character set does not, but also makes it difficult to keep them in sync if we wanted to modify the font mapping (every token using the changed character would also need to be updated).

Something like a GitHub Action or other form of CI could accomplish this, as I don't like the idea of a user needing to include the font map if they all need are tokens. This could instead be provided on the user end, but the code should be regardless standard and accessible (not that its difficult, but we might as well do the work ourselves).

There SHOULD exist at least one textual representation for each token containing only printable ASCII characters (U+0020-U+007E). This ensures that most tokens can easily be typed by users without resorting to manual input of uncommon codepoints.

Should these be specially delineated? Some kind of tag or attribute would make the check easy for ASCII-only applications.

Also, does anyone with more XML knowhow have suggestions about the chars attribute? Just a string of hex digits in a string feels cryptic, so if there's some good way to do this I would want to do it.

tari commented 1 year ago

Would we then simplify the font XML format even more?

That seems reasonable to me.

Something like a GitHub Action or other form of CI could accomplish this, as I don't like the idea of a user needing to include the font map if they all need are tokens.

Also reasonable; source data should avoid redundancy, but we can provide it in alternate forms that may be easier to consume.

Should these be specially delineated? Some kind of tag or attribute would make the check easy for ASCII-only applications.

If a consumer needs to understand unicode to handle the input specification anyway, it's easy for them to check what codepoints are used and filter elements; tagging ASCII-only ones specially seems like another form of redundancy that we don't want. It would be reasonable to provide an ASCII-only version of the data similar to the fontmap-combined one, though.

tari commented 1 year ago

Also, does anyone with more XML knowhow have suggestions about the chars attribute? Just a string of hex digits in a string feels cryptic, so if there's some good way to do this I would want to do it.

Perhaps use a regular string and use numerical character references as needed? It might be less compact but seems more semantically correct:

<token>
  <lang code="en">
    <canonical chars="&#x05;Dec" />
    <alternate>&gt;Dec</alternate>
  </lang>
</token>

and for regularity, the font map could do the same thing

<byte value="&#1;">𝑛</byte>

XML forbids null bytes even when encoded as a character reference, but null has the same semantics in the character set so that seems fine.

kg583 commented 1 year ago

Should these be specially delineated? Some kind of tag or attribute would make the check easy for ASCII-only applications.

If a consumer needs to understand unicode to handle the input specification anyway, it's easy for them to check what codepoints are used and filter elements; tagging ASCII-only ones specially seems like another form of redundancy that we don't want. It would be reasonable to provide an ASCII-only version of the data similar to the fontmap-combined one, though.

I don't entirely agree on the point that we "don't want" the redundancy; if anything, we are in a prime position to offer convenient redundancy. Providing an entirely separate ASCII file is a solution I can get behind, but I also do not see the problem with being redundant given that providing the data in a convenient manner is exactly the point. To even further that end, I would like to eventually get CI setup to automatically generate other data formats from the XMLs, which would be plainly redundant but useful all the same.

kg583 commented 1 year ago
  • Rework the structure of the <lang> tags. We replace all <name> tags with either <ascii> or <unicode>,

It is also worth noting that the ASCII names are intended to be unique for the purposes for (de)tokenization in a text editor. Since we'd like to maintain the names used by SC and/or TokenIDE for reference, and we're already gonna make a <canonical> tag, it should have an ASCII counterpart (and be identified as such).

tari commented 1 year ago

Providing an entirely separate ASCII file is a solution I can get behind, but I also do not see the problem with being redundant given that providing the data in a convenient manner is exactly the point.

My concern about redundancy relates to difficulty of maintenance: if the same information is encoded two ways in our source data, it's more difficult to change it later. Beyond that I don't really care, which is why generating redundant data via some automatic transformation (CI-based releases, etc) seems like a fine approach.

the ASCII names are intended to be unique for the purposes for (de)tokenization in a text editor. Since we'd like to maintain the names used by SC and/or TokenIDE for reference, and we're already gonna make a tag, it should have an ASCII counterpart

Having an obvious ASCII-only transformation only seems like it matters for Unicode-incapable editors consuming detokenized source code; otherwise the limited character set of (near-)ASCII is primarily for the convenience of humans to type things, in which situation having a canonical representation is unimportant as long as any given string can unambiguously map to a sequence of tokens.

I think my point here is that detokenizers should always prefer the (Unicode) canonical detokenization of things, but we can attempt to ensure that Unicode-incapable applications are still able to interoperate with full-featured ones by asserting that every token have at least one pure-ASCII representation. Maintaining compatibility with existing applications only matters inasmuch as we maintain the strings accepted by those existing tools as supported variants.

I suppose you might be more concerned about detokenized source from a new application being tokenizable by an old one, in which case labelling a "legacy canonical" encoding or something might be useful -- but SourceCoder consumes TokenIDE-style XML files so it wouldn't be difficult to make both of those tools handle the new canonical encodings.

kg583 commented 1 year ago

I suppose you might be more concerned about detokenized source from a new application being tokenizable by an old one, in which case labelling a "legacy canonical" encoding or something might be useful -- but SourceCoder consumes TokenIDE-style XML files so it wouldn't be difficult to make both of those tools handle the new canonical encodings.

Sure, but I guess my point is that we don't "need" Unicode to be the standard for every use case. I always interpreted the approximations as being entirely for display purposes; ASCII token encodings, meanwhile, would remain the standard any time you need to type them in somewhere.

Having (de)tokenization run off Unicode by default makes things work much like TI-Connect CE, which has to have menus with copies of every symbol. SourceCoder and TokenIDE also have these menus, but they aren't strictly necessary once you know the encodings. Granted, you can also memorize the Unicode approximants, but ASCII is plainly easier to type.

The encodings used by SourceCoder and TokenIDE could nonetheless use some improvements. This has been discussed in very disparate places on Discord, and boils down to two main points: standardizing sigils for token categories (e.g. $ for stats vars) and how to differentiate RIGHT (the token) from RIGHT (the five tokens) in a user-friendly way. Said discussions should probably be ported into a separate issue.

But on that note, I hesitate to refer to the ASCII encodings as "legacy", as they simply don't need replacing.

tari commented 1 year ago

Having (de)tokenization run off Unicode by default makes things work much like TI-Connect CE, which has to have menus with copies of every symbol.

No, because a user is free to use any version of a token to write it; I'm only proposing that detokenization prefer to use the Unicode representation, but there's nothing stopping a user from using ASCII versions of the same tokens when typing code.


If the issue is mostly around things being easy to type rather than strict limitation to ASCII, perhaps this is really a question of naming. How about an accessible attribute that can be applied to any child of a token (canonical or alternate) indicating "this token should be easy for humans to type," and provides a hint to applications that they can prefer that version of a token in contexts where they want to detokenize to easily-typed representations.

It also occurs to me that we probably want to make a source-code representation of a program fixed to a single language which implies reorganizing the XML somewhat:

<token>
  <canonical lang="en" chars="&#x05;Dec">
  <alternate accessible>&gt;Dec</alternate>
</token>

Moving the (required) language code onto the canonical element still allows the calculator representation for a given language to be known, but forces others to be language-agnostic (and tokenization would generally ignore non-English canonical representations). This is important because otherwise tokenizers would require a specified input language to select the correct language code to use. I don't believe anybody currently writes TI-BASIC source code in non-English languages, but I'm unaware of any other computer language that supports multilanguage syntax so it seems best to exclude that as an option here as well.

kg583 commented 1 year ago

No, because a user is free to use any version of a token to write it; I'm only proposing that detokenization prefer to use the Unicode representation, but there's nothing stopping a user from using ASCII versions of the same tokens when typing code.

Detokenization into Unicode could look cryptic and misleading if you're attempting to learn the ropes by example. Making Unicode the canonical choice would make detokenizing from ASCII encodings a non-identity function. Given that I know of very few serious programming languages that use non-ASCII for syntax, this feels like just a bad move. If there weren't good reasons to play favorites on our end (mostly just having there be a standard), I would genuinely want neither to be the truly "canonical" encoding, and instead simply offer both and let the application decide. But, push coming to shove, ASCII should be preferred.

If the issue is mostly around things being easy to type rather than strict limitation to ASCII, perhaps this is really a question of naming. How about an accessible attribute that can be applied to any child of a token (canonical or alternate) indicating "this token should be easy for humans to type," and provides a hint to applications that they can prefer that version of a token in contexts where they want to detokenize to easily-typed representations.

This feels extremely unnecessary and similarly cryptic. What, to us, is "accessible", if not ASCII or some other standard? It's just beating around the bush.

Moving the (required) language code onto the canonical element still allows the calculator representation for a given language to be known, but forces others to be language-agnostic (and tokenization would generally ignore non-English canonical representations).

I can get behind moving at least one (of each) encoding into a language agnostic section, though this could also be resolved by simply assuming English to be the default.

rpitasky commented 1 year ago

This issue feels like it has fallen off the rails a bit; I'll attempt to refocus it by clarifying some things about the token sheets here.

My initial intention for the tokens database was that it shouldn't have any particularly strong opinion about a "canonical" detokenization- this is entirely up to the application using the sheets. It merely provides a list of reasonable options, ideally allowing you to request an option that suits your needs, i.e, a printable-ascii name, or as-close-as-you-can-get-with-unicode-to-what-you-see-on-calc name, or (as was suggested here, if I'm understanding things correctly) a sequence of TI-Font bytes. Of course, projects evolve past their creator's intentions, but I still believe this better reflects TI-Toolkit's broader goal of improving documentation and tooling than the alternative.

Obviously the current solution to this end is lazy and particularly uninspired, but I think this leans closer to what @kg583 was suggesting in https://github.com/TI-Toolkit/tokens/issues/12#issuecomment-1548930276.

EDIT: wait, this was a bad take on my part, but I'm falling asleep now, I'll fix it right when I wake up Edit: I do not remember my better take >.>

adriweb commented 1 year ago

I don't believe anybody currently writes TI-BASIC source code in non-English languages

French is definitely a popular ti-basic language used since it's the default on all the French calcs (82A and 83PCE, for the recent models), which is why tivars_lib_cpp makes sure it can tok/detok in both English and French. It only cares about Unicode (with its own token file upstreamed here ) but I'm following this issue as I may change this eventually....

adriweb commented 1 year ago

Also, I'd really like us to end up agreeing on something here, even if it takes a bit more time to discuss all this, because it does seem like a big improvement if "all" recent/modern community tooling for all this ends up using a unique, centralized, and maintained tokens "database" covering all needs, rather than each using its own thing and users being confused as to why something works on one tool and not on another.

Personally, I believe Unicode should be used everywhere just because it makes reading so much easier. And being able to type things correctly "just" (with big quotes) becomes an interface issue that needs to be solved with a great UX otherwise the user will get frustrated (for the tiplanet PB, I've been thinking of having a mix of what current editors do, with both having a catalog/categories pane where you can pick tokens to insert, and having the ability to type tokens via Unicode directly (why not) or ascii that gets automatically replaced by the correct Unicode match. This is still a thought-in-progress, though

tari commented 1 year ago

French is definitely a popular ti-basic language used since it's the default on all the French calcs (82A and 83PCE, for the recent models)

Okay, so I guess we'd want to retain the existing lang->strings hierarchy in that case- doesn't seem like a big deal.

having the ability to type tokens via Unicode directly (why not) or ascii that gets automatically replaced by the correct Unicode match

I had a thought like this too; a fancy autocomplete system that lets you type easily and converts to tokens on the fly seems ideal. The goal of the data should be to enable that sort of thing while still supporting less fancy tools and allowing them all to interoperate. This means providing a representation of each given token that closely matches what a calculator displays (the canonical one expressed in terms of the calculator charset) and zero or more aliases that may be easier to type.

For the benefit of applications that may want to prefer easily typed tokens, it seems reasonable to offer my proposed accessible version, but taking no particular position on what charset that actually is (which is why I'm against calling it ascii: this is about being accessible, not choosing a given character set). Although all accessible variants may end up being pure ASCII, that would require further discussion to arrive at.

rpitasky commented 1 year ago

Re: @kg583:

This feels extremely unnecessary and similarly cryptic. What, to us, is "accessible", if not ASCII or some other standard? It's just beating around the bush.

TI-ASCII is a term that exists and is similar enough in name and form to standard ASCII to be easily conflated. Instead of "accessible", I suggest the admittedly somewhat clumsy "typeable" or "typable" (the latter of which autocorrect and the OED dislike but Wiktionary and MW list as an alternate spelling), which even more precisely captures what we want from these entries.

Re: @adriweb, on the topic of only having one Unicode translation: A further goal that should be considered is having mappings for all of the existing/old token sheets that were in use before this token sheet (obviously, within reason- though no totally unreasonable mappings exist to my knowledge). I think this would save lots of potential headaches when it comes to programs saved in old forum posts, etc. Multiple Unicode representations exist for every token; while we can (and in my opinion, should) select a favored one, this token sheet is for more than just editors.


We should decide on something actionable sooner rather than later so we can actually, y'know, use this thing.

I think each lang must have a required canonical name for the TI-Font representation, a required unique typeable or accessible (we need to decide on the name of the tag still) name for the common keyboard-input representation, a unique required preferred name for an as-close-as-possible Unicode interpretation, and any number of optional variant alternative Unicode names. Each token should have at least en and fr language support and optionally other languages too (following ISO 639-1 for the names).

kg583 commented 1 year ago

I think each lang must have a required canonical name for the TI-Font representation, a required unique typeable or accessible (we need to decide on the name of the tag still) name for the common keyboard-input representation, a unique required preferred name for an as-close-as-possible Unicode interpretation, and any number of optional variant alternative Unicode names. Each token should have at least en and fr language support and optionally other languages too (following ISO 639-1 for the names).

I second this, though favor accessible over typeable for exactly the reasons you stated. I flip-flop about canonical as well, since it doesn't describe what it is very well (i.e. the list of bytes in the font map that the token uses). preferred is fine I suppose, with my reservations mostly the same as accessible in that it's an "opinionated" adjective.

kg583 commented 1 year ago

To try to sum up my position about the accessible tag at the moment:

My motivation for being so adamant about having ASCII-only names stems from typing TI-BASIC into token editors, something I've done for years. I would say that typeability is probably the biggest boon of ASCII representations, but this doesn't mean I support ASCII names solely for this reason, and the name of the tag should not reflect only that reason either.

ASCII is more typeable, more likely to be supported by a custom font, and more recognizable. I don't necessarily hold any of these reasons in higher regard than any other for the purposes of inclusion in this standard; I simply recognize that any of those are reasons, and thus we should be purely descriptive with the tag name.

accessible indicates that all we care about is some kind of accessibility; typeable is even more narrow. But the name ascii says what it is and nothing more. I don't care why you want ASCII. I don't care what you do with it. But I know some people would like to have it, so here it is. Calling it anything else masks the intention (or at least my own intention; there is obviously some divide on that).

adriweb commented 1 year ago

I like "canonical", but I guess that implies there's only 1 canonical one (and everything else is alternatives). However if we're not going with a unique ascii or Unicode equivalent, then maybe we can just consider that the list is ordered, by preferred alternatives, descending (most preferred, implicitly "canonical", first)

I'm also fine with "ascii".

rpitasky commented 1 year ago

There is only one canonical TI-Font representation for any given token in any given translation?

On Sat, Jun 10, 2023 at 12:36 PM Adrien Bertrand @.***> wrote:

I like "canonical", but I guess that implies there's only 1 canonical one (and everything else is alternatives). However if we're not going with a unique ascii or Unicode equivalent, then maybe we can just consider that the list is ordered, by preferred alternatives, descending (most preferred, implicitly "canonical", first)

I'm also fine with "ascii".

— Reply to this email directly, view it on GitHub https://github.com/TI-Toolkit/tokens/issues/12#issuecomment-1585730049, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2QMYGNKPAUJ7UAQENAZXEDXKSPB3ANCNFSM6AAAAAAXI3ZKNY . You are receiving this because you commented.Message ID: @.***>

kg583 commented 1 year ago

Let me try to bring everything back together, because I think we are all converging on a reasonable solution.

There are four fields that should be provided per token, per language. These fields will remain unnamed in this proposition, as this is one of the main points of contention for the fomat. We seek to use these fields to define the standard representations of tokens for various applications.

There are plenty of other motivations for these fields, and I think everyone in the discussion so far can get behind some inclusion of all the data mentioned. In addition,

We still need to decide on the names of these fields. The current suggestions are

My personal choices would be, reflecting my desire to be as purely descriptive as possible,

Other miscellaneous points to be addressed:

This proposition is a hopeful summary of the discussion here and on Discord thus far that aims to reconcile the disagreements and misunderstandings earlier. If there is a important concern that still wasn't addressed here, please say as much. A functional format that is unanimously tolerable is the top priority.

tari commented 1 year ago

I think I agree with that general structure. I think field B should be automatically generated from the font map and value of field A, simply because that's easy to do automatically and makes everything easier to maintain.

Just to bikeshed the namings:

  1. I don't like value, but chars is okay even if I prefer canonical
  2. unicode seems like a bad name because it's too generic: every textual element is unicode so it seems wrong to pick out one field as a special kind of unicode. Alternate proposal: display to indicate it's how a calculator would display the token.
  3. As with unicode, ascii is a bad name because many tokens are ASCII even in their "unicode" (field B) form. I still think accessible is the best of these, but there might be an interesting question around policy for defining values of this field which could inform its name: do we assume a particular keyboard layout for a given language such that the English tokens for example only contain characters that are present on a en-US keyboard even though other languages would be able to use a different character (since a French keyboard for instance typically has an AltGr and dead keys that allow accented characters to be typed easily)?
  4. I like variant, but either seems okay.
kg583 commented 1 year ago

I think field B should be automatically generated from the font map and value of field A, simply because that's easy to do automatically and makes everything easier to maintain.

Yes, this is what I was implying with the second bullet point after the list of fields.

I don't like value, but chars is okay even if I prefer canonical

canonical just feels so non-descriptive of what is actually in the field, and to that end I similarly prefer chars over value. Hell, the term canonical proved confusing just a few comments up in this thread.

Alternate proposal: display to indicate it's how a calculator would display the token.

I'll second this (and @rpitasky third'd on Discord).

As with unicode, ascii is a bad name because many tokens are ASCII even in their "unicode" (field B) form. I still think accessible is the best of these, but there might be an interesting question around policy for defining values of this field which could inform its name: do we assume a particular keyboard layout for a given language such that the English tokens for example only contain characters that are present on a en-US keyboard even though other languages would be able to use a different character (since a French keyboard for instance typically has an AltGr and dead keys that allow accented characters to be typed easily)?

I think I came off too strongly about the typeability point, even if its maybe the best motivation for the field. It's really just a matter of how ASCII representations pervade the token editors, and they in turn inform how code gets shared over text, so we should absolutely keep them. I suppose you could call these "accessibility" reasons, but I again think that this misses the point. The most important fact about the field is that it is ASCII; any other name implies we could deviate if we wanted to. I furthermore have no idea why some display names being purely ASCII is a concern; there's no if-and-only-if here and I don't think anybody is expecting one. If fields B and C happen to agree for some tokens, so be it.

rpitasky commented 1 year ago

Yeah, I give display my metaphorical rubber stamp, as well as autogenerating things. How about tifont for the tifont bytes?

Regardless, the conversations we have had here and in Discord and in Matrix are barely documentation; whatever we choose must be documented clearly in at least the README (I recognize this probably goes without saying, but it makes debates over whether something captures all of the requisite meaning a little less important).

commandblockguy commented 1 year ago

How about tifont for the tifont bytes?

I dislike the use of the term "font" to refer to a series of encoded bytes, since a font defines the display of glyphs that are already encoded in a particular way, and this is completely unrelated to that. We should name the field after the encoding scheme that tifont and the encoded token names use. Of course, this is just kicking the can down the road a bit, as it's very unlikely there's an official name for it (the TI-83 SDK docs mention "extended ASCII," but it's in the glossary and it seems likely that this is referring to the part where they describe the input format for the assembler, rather than the actual character set used on the calculator) and we would need to come up with our own. "TI-ASCII" seems to be relatively common, but I dislike that term too, since it's not a superset of ASCII. We could probably just call it something boring like ti-calc-encoding or ti83-encoding.

tari commented 1 year ago

If Commodore can have PETSCII, how about 8xSCII?

LogicalJoe commented 1 year ago

TISCII?

rpitasky commented 1 year ago

calc-encoding or calc-encoded?

rpitasky commented 1 year ago

cemetech, at least, has somewhat converged on ti-ascii; are we good with this here?

kg583 commented 1 year ago

I'm fine with ti-ascii, though I feel now there should be a separation between encoding(s) and names in the format, so that we can better communicate the "data type" of the fields in-place and not worry about bad name clashes like ti-ascii vs. ascii, which refer to completely different types.

Thus cometh a proposition where I, kg583, endorse the use of accessible:

<lang code="en">
  <encodings>
    <ti-ascii>Field A</ti-ascii>
    <!-- are there any other viable encodings?
         cause if not this does seem maybe a tad silly -->
  </encodings>
  <names>
    <display>Field B</display>
    <accessible>Field C</accessible>
    <variant>Field D</variant>
    <!-- more variants as need be -->
  </names>
</lang>

Also, its worth noting that the "correct" way to specify the encoding, an XML string with explicit bytes where necessary, would actually motivate something like <ti-ascii value=""></ti-ascii>.

tari commented 1 year ago

The above proposal (https://github.com/TI-Toolkit/tokens/issues/12#issuecomment-1586514583) seems reasonable in general, but there's a little bit of complexity that I think can be omitted. I think if we're unable to identify any other relevant encodings, the value of Field A could just as easily be placed in an attribute of the lang element; additional encodings could always be added in a new schema version if a reason were found to include them.

<lang code="en" ti-ascii="Field A">
  <display>Field B</display>
  <accessible>Field C</accessible>
  <variant>Field D</variant>
  <!-- additional variants if applicable -->
</lang>

This doesn't change any of the names (though it does drop the word encoding) and reduces nesting which makes the XML a little easier to parse.

I suppose we could take the attribute-ification even further and convert all of the non-repeatable elements into attributes of lang which doesn't sacrifice any expressiveness and is even easier to parse.

LogicalJoe commented 1 year ago

Even if it's not entirely relevant, I wanted to mention that Field A has language-specific encodings(?) as some language apps change specific characters (tested on the CE). For example, in Swedish, 090h, English â, is changed to å.

tari commented 1 year ago

Is that actually field A, or is it field B? I take your comment to mean that the character 90h is changed, which leaves field A unchanged but B becomes different.

LogicalJoe commented 1 year ago

Field B would be different than if it was generated purely from the English TI-ASCII encoding and Field A. I was meaning more "Field A would not necessarily be 1:1 with the other fields" which I wanted to mention in case it was assumed that it would be 1:1 from the only documentation really existing in English.

tari commented 1 year ago

Field A is the calculator characters though, and I don't think the normal character set has an å at all- it seems like you're saying the Swedish localization changes the character set so the glyph corresponding to character 0x90 is different but the character data is unchanged. That would mean field A is unchanged in every case, but field B is different between Swedish and other languages.

rpitasky commented 1 year ago

I've written a small tool to extract translations from the app files; I'll probably give it a home in this org somewhere once we decide where, but here's French, Spanish, Portuguese: https://gist.github.com/rpitasky/5daa6eb4090fb4b1c9360e0eb2404ce6

A "None" means the token inherits from the default (English) string.

rpitasky commented 1 year ago

I'd like to get moving on this, I'm willing to adopt #12 (comment).

Does anyone have the TI font encodings for the English tokens handy? Are we sure the translation table above (and particularly the TokenIDE sheets) are faithful?

tari commented 1 year ago

It's pretty easy for me to generate the TI-ASCII encodings from the mapping I defined and an XML file, where my Tokens.xml is intended to provide this function but hasn't been checked for any kind of accuracy and may be missing some newer (CE) tokens.

rpitasky commented 1 year ago

The general consensus on discord was that a fresh extraction would be the best; I've gone ahead and done this (this comes from OS 5.3.0.0037):

Dumped strings ``` 05444d53 ►DMS 05446563 ►Dec 0546726163 ►Frac 1c → c1 [ 5d ] 7b { 7d } 28 ( 29 ) 726f756e6428 round( 70786c2d5465737428 pxl-Test( 6175676d656e7428 augment( 726f775377617028 rowSwap( 726f772b28 row+( 2a726f7728 *row( 2a726f772b28 *row+( 6d617828 max( 6d696e28 min( 5205505b28 R►Pθ( 5205507228 R►Pr( 5005527828 P►Rx( 5005527928 P►Ry( 6d656469616e28 median( 72616e644d28 randM( 6d65616e28 mean( 736f6c766528 solve( 73657128 seq( 666e496e7428 fnInt( 6e446572697628 nDeriv( 20 22 " 2c , 21 ! 15 ʳ 14 ° 11 ⁻¹ 12 ² 16 ᵀ d5 . 30 0 31 1 32 2 33 3 34 4 35 5 36 6 37 7 38 8 39 9 2e . 1b ᴇ 206f7220 or 20786f7220 xor 3a : 20616e6420 and 41 A 42 B 43 C 44 D 45 E 46 F 47 G 48 H 49 I 4a J 4b K 4c L 4d M 4e N 4f O 50 P 51 Q 52 R 53 S 54 T 55 U 56 V 57 W 58 X 59 Y 5a Z 5b θ 7072676d prgm 52616469616e Radian 446567726565 Degree 4e6f726d616c Normal 536369 Sci 456e67 Eng 466c6f6174 Float 3d = 3c < 3e > 17 ≤ 19 ≥ 18 ≠ 2b + 2d - 46697820 Fix 486f72697a Horiz 46756c6c Full 46756e63 Func 506172616d Param 506f6c6172 Polar 536571 Seq 496e64706e744175746f IndpntAuto 496e64706e7441736b IndpntAsk 446570656e644175746f DependAuto 446570656e6441736b DependAsk 2a * 2f / 5472616365 Trace 436c7244726177 ClrDraw 5a5374616e64617264 ZStandard 5a54726967 ZTrig 5a6f6f6d20496e Zoom In 5a6f6f6d204f7574 Zoom Out 5a537175617265 ZSquare 5a496e7465676572 ZInteger 5a50726576696f7573 ZPrevious 5a446563696d616c ZDecimal 5a6f6f6d53746174 ZoomStat 5a6f6f6d52636c ZoomRcl 5a426f78 ZBox 5072696e7453637265656e PrintScreen 447261774620 DrawF 5465787428 Text( 206e507220 nPr 206e437220 nCr 466e4f6e20 FnOn 466e4f666620 FnOff 53746f726550696320 StorePic 526563616c6c50696320 RecallPic 53746f726547444220 StoreGDB 526563616c6c47444220 RecallGDB 4c696e6528 Line( 566572746963616c20 Vertical 50742d4f6e28 Pt-On( 50742d4f666628 Pt-Off( 50742d4368616e676528 Pt-Change( 50786c2d4f6e28 Pxl-On( 50786c2d4f666628 Pxl-Off( 50786c2d4368616e676528 Pxl-Change( 536861646528 Shade( 436972636c6528 Circle( 486f72697a6f6e74616c20 Horizontal 54616e67656e7428 Tangent( 44726177496e7620 DrawInv 53656c65637428 Select( 72616e64 rand c4 π 6765744b6579 getKey 27 ' 3f ? 1a ⁻ 696e7428 int( 61627328 abs( 64657428 det( 6964656e7469747928 identity( 64696d28 dim( 73756d28 sum( 70726f6428 prod( 6e6f7428 not( 695061727428 iPart( 665061727428 fPart( 1028 √( 0e1028 ³√( 6c6e28 ln( db5e28 𝑒^( 6c6f6728 log( 1d5e28 ₁₀^( 73696e28 sin( 73696e1128 sin⁻¹( 636f7328 cos( 636f731128 cos⁻¹( 74616e28 tan( 74616e1128 tan⁻¹( 73696e6828 sinh( 73696e681128 sinh⁻¹( 636f736828 cosh( 636f73681128 cosh⁻¹( 74616e6828 tanh( 74616e681128 tanh⁻¹( 496620 If 5468656e Then 456c7365 Else 5768696c6520 While 52657065617420 Repeat 466f7228 For( 456e64 End 52657475726e Return 4c626c20 Lbl 476f746f20 Goto 506175736520 Pause 53746f70 Stop 49533e28 IS>( 44533c28 DS<( 496e70757420 Input 50726f6d707420 Prompt 4469737020 Disp 446973704772617068 DispGraph 4f757470757428 Output( 436c72486f6d65 ClrHome 46696c6c28 Fill( 536f72744128 SortA( 536f72744428 SortD( 4d656e7528 Menu( 446973705461626c65 DispTable 506c6f74734f6e20 PlotsOn 506c6f74734f666620 PlotsOff 5a6f6f6d53746f ZoomSto 5e ^ cd10 ˣ√ 312d56617220537461747320 1-Var Stats 4c696e52656728612b62782920 LinReg(a+bx) 4c696e5265672861782b622920 LinReg(ax+b) 45787052656720 ExpReg 4c6e52656720 LnReg 50777252656720 PwrReg 4d65642d4d656420 Med-Med 5175616452656720 QuadReg 436c724c69737420 ClrList 486973746f6772616d Histogram 78794c696e65 xyLine 53657175656e7469616c Sequential 53696d756c Simul 506f6c61724743 PolarGC 526563744743 RectGC 436f6f72644f6e CoordOn 436f6f72644f6666 CoordOff 546869636b Thick 446f742d546869636b Dot-Thick 417865734f6e20 AxesOn 417865734f6666 AxesOff 47726964446f7420 GridDot 477269644f6666 GridOff 4c6162656c4f6e LabelOn 4c6162656c4f6666 LabelOff 576562 Web 54696d65 Time c1415d [A] c1425d [B] c1435d [C] c1445d [D] c1455d [E] c1465d [F] c1475d [G] c1485d [H] c1495d [I] c14a5d [J] 4c81 L₁ 4c82 L₂ 4c83 L₃ 4c84 L₄ 4c85 L₅ 4c86 L₆ 5981 Y₁ 5982 Y₂ 5983 Y₃ 5984 Y₄ 5985 Y₅ 5986 Y₆ 5987 Y₇ 5988 Y₈ 5989 Y₉ 5980 Y₀ 58810d X₁ᴛ 59810d Y₁ᴛ 58820d X₂ᴛ 59820d Y₂ᴛ 58830d X₃ᴛ 59830d Y₃ᴛ 58840d X₄ᴛ 59840d Y₄ᴛ 58850d X₅ᴛ 59850d Y₅ᴛ 58860d X₆ᴛ 59860d Y₆ᴛ 7281 r₁ 7282 r₂ 7283 r₃ 7284 r₄ 7285 r₅ 7286 r₆ 75 u 76 v 77 w 50696331 Pic1 50696332 Pic2 50696333 Pic3 50696334 Pic4 50696335 Pic5 50696336 Pic6 50696337 Pic7 50696338 Pic8 50696339 Pic9 50696330 Pic0 47444231 GDB1 47444232 GDB2 47444233 GDB3 47444234 GDB4 47444235 GDB5 47444236 GDB6 47444237 GDB7 47444238 GDB8 47444239 GDB9 47444230 GDB0 322d56617220537461747320 2-Var Stats 53636174746572 Scatter 416e73 Ans 5265674551 RegEQ 6e n cb x̄ c678 Σx c67812 Σx² c778 σx 5378 Sx 6d696e58 minX 6d617858 maxX 6d696e59 minY 6d617859 maxY cc ȳ c679 Σy c67912 Σy² c779 σy 5379 Sy c67879 Σxy 62 b 61 a 72 r 4d6564 Med 5181 Q₁ 5183 Q₃ 63 c 7881 x₁ 7882 x₂ 7883 x₃ 7981 y₁ 7982 y₂ 7983 y₃ 01 𝑛 586d696e Xmin 586d6178 Xmax 596d696e Ymin 596d6178 Ymax 5b6d696e θmin 5b6d6178 θmax 546d696e Tmin 546d6178 Tmax 5873636c Xscl 5973636c Yscl 5b73746570 θstep 5473746570 Tstep 506c6f745374617274 PlotStart 5846616374 XFact 5946616374 YFact 5a586d696e ZXmin 5a586d6178 ZXmax 5a596d696e ZYmin 5a596d6178 ZYmax 5a5b6d696e Zθmin 5a5b6d6178 Zθmax 5a546d696e ZTmin 5a546d6178 ZTmax 5a5b73746570 Zθstep 5a5473746570 ZTstep 5a506c6f745374617274 ZPlotStart 54626c5374617274 TblStart be54626c ΔTbl 7528014d696e29 u(𝑛Min) 7628014d696e29 v(𝑛Min) be58 ΔX be59 ΔY 014d6178 𝑛Max 5a014d6178 Z𝑛Max 5a7528014d696e29 Zu(𝑛Min) 5a7628014d696e29 Zv(𝑛Min) 5a5873636c ZXscl 5a5973636c ZYscl d6 . 53656e6428 Send( 47657428 Get( 437562696352656720 CubicReg 517561727452656720 QuartReg 54626c496e707574 TblInput 64 d 65 e 506c6f743128 Plot1( 506c6f743228 Plot2( 506c6f743328 Plot3( 426f78706c6f74 Boxplot 0a □ 0b ﹢ 0c · 664d696e28 fMin( 664d617828 fMax( 436c725461626c65 ClrTable d7 𝑖 dc ʟ 53747231 Str1 53747232 Str2 53747233 Str3 53747234 Str4 53747235 Str5 53747236 Str6 53747237 Str7 53747238 Str8 53747239 Str9 53747230 Str0 dd 𝗡 4925 I% 5056 PV 504d54 PMT 4656 FV 502f59 P/Y 432f59 C/Y 7728014d696e29 w(𝑛Min) 5a7728014d696e29 Zw(𝑛Min) 757641786573 uvAxes 767741786573 vwAxes 757741786573 uwAxes 53696e52656720 SinReg 4c6f67697374696320 Logistic 53686164654e6f726d28 ShadeNorm( 53686164655f7428 Shade_t( 5368616465d91228 Shadeχ²( 5368616465da28 Shade𝙵( 5a2d5465737428 Z-Test( 542d5465737420 T-Test 322d53616d705a5465737428 2-SampZTest( 322d53616d70545465737420 2-SampTTest 312d50726f705a5465737428 1-PropZTest( 322d50726f705a5465737428 2-PropZTest( d9122d5465737428 χ²-Test( 322d53616d70da5465737420 2-Samp𝙵Test 322d53616d7054496e7420 2-SampTInt 5a496e74657276616c20 ZInterval 54496e74657276616c20 TInterval 322d53616d705a496e7428 2-SampZInt( 312d50726f705a496e7428 1-PropZInt( 322d50726f705a496e7428 2-PropZInt( 6e6f726d616c63646628 normalcdf( 696e764e6f726d28 invNorm( 7463646628 tcdf( d91263646628 χ²cdf( da63646628 𝙵cdf( 62696e6f6d70646628 binompdf( 62696e6f6d63646628 binomcdf( 706f6973736f6e70646628 poissonpdf( 706f6973736f6e63646628 poissoncdf( 67656f6d657470646628 geometpdf( 67656f6d657463646628 geometcdf( 74766d5f506d74 tvm_Pmt 74766d5f4925 tvm_I% 74766d5f5056 tvm_PV 74766d5fdd tvm_𝗡 74766d5f4656 tvm_FV 6e707628 npv( 69727228 irr( 62616c28 bal( c650726e28 ΣPrn( c6496e7428 ΣInt( 506d745f456e64 Pmt_End 506d745f42676e Pmt_Bgn 054e6f6d28 ►Nom( 0545666628 ►Eff( 64626428 dbd( 70 p 7a z 74 t d912 χ² da 𝙵 6466 df d8 p̂ d881 p̂₁ d882 p̂₂ cb81 x̄₁ cb82 x̄₂ 537881 Sx₁ 537882 Sx₂ 537870 Sxp 6e81 n₁ 6e82 n₂ 6c6f776572 lower 7570706572 upper 636f6e6a28 conj( 7265616c28 real( 616e676c6528 angle( 696d616728 imag( 6c636d28 lcm( 67636428 gcd( 72616e64496e7428 randInt( 72616e644e6f726d28 randNorm( 014d696e 𝑛Min 5a014d696e Z𝑛Min 45717505537472696e6728 Equ►String( 537472696e670545717528 String►Equ( 6578707228 expr( 6c656e67746828 length( 696e537472696e6728 inString( 73756228 sub( 73746444657628 stdDev( 76617269616e636528 variance( 436c65617220456e7472696573 Clear Entries db 𝑒 436c72416c6c4c69737473 ClrAllLists 5265616c Real 72db5e5bd7 r𝑒^θ𝑖 612b62d7 a+b𝑖 4c697374056d61747228 List►matr( 4d617472056c69737428 Matr►list( 63756d53756d28 cumSum( 0552656374 ►Rect 05506f6c6172 ►Polar 47657443616c6328 GetCalc( 44656c56617220 DelVar 5365745570456469746f7220 SetUpEditor be4c69737428 ΔList( 506c6f7453746570 PlotStep 5a506c6f7453746570 ZPlotStep 73 s 58726573 Xres 4c696e526567545465737420 LinRegTTest 5a58726573 ZXres 457870724f6e ExprOn 457870724f6666 ExprOff 4d6f64426f78706c6f74 ModBoxplot 4e6f726d50726f62506c6f74 NormProbPlot 72656628 ref( 7272656628 rref( 6e6f726d616c70646628 normalpdf( 7470646628 tpdf( d91270646628 χ²pdf( da70646628 𝙵pdf( 414e4f564128 ANOVA( 72616e6442696e28 randBin( 47726170685374796c6528 GraphStyle( 7212 r² 5212 R² 472d54 G-T 5a6f6f6d466974 ZoomFit 5353 SS 4d53 MS 446961676e6f737469634f6e DiagnosticOn 446961676e6f737469634f6666 DiagnosticOff 55012d81 U𝑛-₁ 56012d81 V𝑛-₁ 4172636869766520 Archive 556e4172636869766520 UnArchive 41736d28 Asm( 41736d5072676d AsmPrgm 41736d436f6d7028 AsmComp( 3f ? 8a Á 8b À 8c  8d Ä 8e á 8f à 90 â 91 ä 92 É 93 È 94 Ê 95 Ë 96 é 97 è 98 ê 99 ë 9a Í 9b Ì 9c Î 9d Ï 9e í 9f ì a0 î a1 ï a2 Ó a3 Ò a4 Ô a5 Ö a6 ó a7 ò a8 ô a9 ö aa Ú ab Ù ac Û ad Ü ae ú af ù b0 û b1 ü b2 Ç b3 ç b4 Ñ b5 ñ b6 ´ b7 ` b8 ¨ b9 ¿ ba ¡ bb α bc β bd γ be Δ bf δ c0 ε c2 λ c3 μ c4 π c5 ρ c6 Σ c7 σ c8 τ c9 Φ ca Ω d8 p̂ d9 χ da 𝙵 61 a 62 b 63 c 64 d 65 e 66 f 67 g 68 h 69 i 6a j 6b k 6c l 6d m 6e n 6f o 70 p 71 q 72 r 73 s 74 t 75 u 76 v 77 w 78 x 79 y 7a z 47617262616765436f6c6c656374 GarbageCollect 7e ~ 40 @ 23 # 24 ⁴ 26 & 60 ‛ 3b ; 5c \ 7c | 5f _ 25 % ce … 13 ∠ f4 ß cd ˣ 0d ᴛ 80 ₀ 81 ₁ 82 ₂ 83 ₃ 84 ₄ 85 ₅ 86 ₆ 87 ₇ 88 ₈ 89 ₉ 1d ₁₀ cf ◄ 05 ► 1e ↑ 1f ↓ 09 × 08 ∫ f3 🡁 07 🠿 10 √ 7f ⍯ 7365744461746528 setDate( 73657454696d6528 setTime( 636865636b546d7228 checkTmr( 7365744474466d7428 setDtFmt( 736574546d466d7428 setTmFmt( 74696d65436e7628 timeCnv( 6461794f66576b28 dayOfWk( 676574447453747228 getDtStr( 676574546d53747228 getTmStr( 67657444617465 getDate 67657454696d65 getTime 7374617274546d72 startTmr 6765744474466d74 getDtFmt 676574546d466d74 getTmFmt 6973436c6f636b4f6e isClockOn 436c6f636b4f6666 ClockOff 436c6f636b4f6e ClockOn 4f70656e4c696228 OpenLib( 457865634c696220 ExecLib 696e765428 invT( d912474f462d5465737428 χ²GOF-Test( 4c696e52656754496e7420 LinRegTInt 4d616e75616c2d46697420 Manual-Fit 5a5175616472616e7431 ZQuadrant1 5a4672616331f632 ZFrac1⁄2 5a4672616331f633 ZFrac1⁄3 5a4672616331f634 ZFrac1⁄4 5a4672616331f635 ZFrac1⁄5 5a4672616331f638 ZFrac1⁄8 5a4672616331f63130 ZFrac1⁄10 f6 ⁄ f5 󸏵 056ef664cf05556ef664 ►n⁄d◄►Un⁄d 0546cf0544 ►F◄►D 72656d61696e64657228 remainder( c628 Σ( 6c6f674241534528 logBASE( 72616e64496e744e6f52657028 randIntNoRep( f7 . 4d4154485052494e54 MATHPRINT 434c4153534943 CLASSIC 6ef664 n⁄d 556ef664 Un⁄d 4155544f AUTO 444543 DEC 465241432d415050524f58 FRAC-APPROX 3f ? 5354415457495a415244204f4e STATWIZARD ON 5354415457495a415244204f4646 STATWIZARD OFF 524544 RED 424c5545 BLUE 424c41434b BLACK 4d4147454e5441 MAGENTA 475245454e GREEN 4f52414e4745 ORANGE 42524f574e BROWN 4e415659 NAVY 4c54424c5545 LTBLUE 59454c4c4f57 YELLOW 5748495445 WHITE 4c5447524159 LTGRAY 4d454447524159 MEDGRAY 47524159 GRAY 4441524b47524159 DARKGRAY 496d61676531 Image1 496d61676532 Image2 496d61676533 Image3 496d61676534 Image4 496d61676535 Image5 496d61676536 Image6 496d61676537 Image7 496d61676538 Image8 496d61676539 Image9 496d61676530 Image0 477269644c696e6520 GridLine 4261636b67726f756e644f6e20 BackgroundOn 4261636b67726f756e644f6666 BackgroundOff 4772617068436f6c6f7228 GraphColor( 517569636b506c6f74264669742d4551 QuickPlot&Fit-EQ 54657874436f6c6f7228 TextColor( 547261636553746570 TraceStep 41736d3834435072676d Asm84CPrgm 4465746563744173796d4f6e DetectAsymOn 4465746563744173796d4f6666 DetectAsymOff 426f72646572436f6c6f7220 BorderColor f9 . 5468696e Thin 446f742d5468696e Dot-Thin 506c79536d6c7432 PlySmlt2 41736d383443455072676d Asm84CEPrgm 5175617274696c65732053657474696e67ce Quartiles Setting… 7528012d3229 u(𝑛-2) 7628012d3229 v(𝑛-2) 7728012d3229 w(𝑛-2) 7528012d3129 u(𝑛-1) 7628012d3129 v(𝑛-1) 7728012d3129 w(𝑛-1) 75280129 u(𝑛) 76280129 v(𝑛) 77280129 w(𝑛) 7528012b3129 u(𝑛+1) 7628012b3129 v(𝑛+1) 7728012b3129 w(𝑛+1) 70696563657769736528 piecewise( 534551280129 SEQ(𝑛) 53455128012b3129 SEQ(𝑛+1) 53455128012b3229 SEQ(𝑛+2) 4c454654 LEFT 43454e544552 CENTER 5249474854 RIGHT 696e7642696e6f6d28 invBinom( 5761697420 Wait 746f537472696e6728 toString( 6576616c28 eval( 457865637574652050726f6772616d Execute Program 556e646f20436c656172 Undo Clear 496e73657274204c696e652041626f7665 Insert Line Above 437574204c696e65 Cut Line 436f7079204c696e65 Copy Line 5061737465204c696e652042656c6f77 Paste Line Below 496e7365727420436f6d6d656e742041626f7665 Insert Comment Above 5175697420456469746f72 Quit Editor ```

Beware that they need to be reordered to be inserted into the tokens sheet, they are only somewhat in a reasonable order now.

Amper5ands commented 1 year ago

https://gist.github.com/Amper5ands/fd6328ddff1e56313c6e590187512340 tokens (?)