Timendus / ticalc-usb

A library to communicate with TI graphing calculators through WebUSB
https://ticalc.link
GNU General Public License v3.0
27 stars 4 forks source link

Variables with non-UTF8 characters in their name get the wrong name after transmission #32

Open Timendus opened 3 years ago

Timendus commented 3 years ago

Things like θ, small L and probably Pic / Str get messed up. The same may be true the other way around for the result from getDirectory.

As described by @debrouxl in Timendus/ticalc.link#8:

dealing with file name encoding / tokenization matters for both display and transfer functionality, for instance (and probably not limited to):

  • sending to a 84+ through DUSB data from a 83+ / 84+ program stored in a 83+ format with which the 84+ is compatible. For instance, the names of Pic0-9 (reasonably frequently used in games or school programs), GDB0-9 and Str0-9 are tokenized.
  • sending or receiving FlashApps performing language localization: at least the Spanish language localization FlashApp's name contains a special international character, n tilde;
  • sending or receiving variables containing Greek character names other than θ, some of which are definitely valid in file names at least on the TI-68k series. I've just created two variables named (greek beta) and (greek gamma) in TIEmu, then used TILP to perform a dirlist through the virtual cable and DBUS protocol: TILP displayed the expected appropriate greek letters.

The files of libticonv most relevant to you are charset.cc and filename.cc: https://github.com/debrouxl/tilibs/tree/experimental2/libticonv/trunk/src .

To do: write bi-directional mapping functions between TI encoding and UTF-8.

debrouxl commented 3 years ago

Between multiple TI encodings and UTF-8, even :) From day 1, and even without implementing support for the TI-68k series or the older TI-Z80 models, the design needs to allow for multiple charsets, due to the classic DBUS encodings and the newer DUSB / CARS encodings, both relevant to the newer TI-Z80 & TI-eZ80 models.

Eventually, for ticalc-usb+ticalc.link to become a high-quality alternative, you'll have to implement nearly all of the layers provided by libti* anyway ;)

Timendus commented 2 years ago

I've been spending some time on this issue today, without making much progress. After porting over some of charset.cc and trying a couple of things, I can now parse theta properly as θ instead of [ on the PC side of things. That's nice, but now I'm not sure what to send to the calculator to get it to show the right thing too.

I'm assuming I have to return the name back to the byte stream as found in the file, and not touch it. But that should be exactly what I'm already doing, so that doesn't seem to be the answer. Or there's a bug in my logic somewhere that is transforming the data where it shouldn't. Can someone confirm that the format in which the names of variables are stored in (for example) the `.8xg` file is indeed the TI format that the calculator expects, and not UTF8 or some other weirdness?

*) unless we're opening a TI-83+ file and sending it to a TI-84+, if I understand your remark above correctly. In that case I assume the correct way would be: convert from TI-83+ format to UTF16 as an intermediary and then from UTF16 to Ti-84+ format.

Timendus commented 2 years ago

Adding insult to injury, θ is represented as 91 in the file, which is also what it says both here:

https://github.com/debrouxl/tilibs/blob/a4a638df4494aa8d80819e485c4e3316a158f1ef/libticonv/trunk/src/charset.cc#L689

(0x3b8 being the 91th element, zero indexed) and here:

https://github.com/debrouxl/tilibs/blob/a4a638df4494aa8d80819e485c4e3316a158f1ef/libticonv/trunk/src/charset.cc#L771

Yet still, sending 91 to my test TI-84+ renders not θ but [ in the PRGM menu.

Timendus commented 2 years ago

Sending 0x3b8 renders θ though... 😂 What the hell. Does this mean that these conversions are just from "insane TI file format" to "normal space, including on the calculator"? I thought they were a mapping from calculator charset to UTF16, as it says on the tin..?

Timendus commented 2 years ago

Wait... the conversion is just for the "nonusb" part..? So for theta 91 is the value to send through the link port, but through the USB port it expects 0x03b8 in UTF16..?

debrouxl commented 2 years ago

Although it's not quite perfect ( https://github.com/debrouxl/tilibs/issues/12 ), for checking the sequences of bytes flowing through the cable, the output of libti*'s packet logging code, in ~/.ticables after closing TILP, is better than capturing raw USB packets with the likes of usbmon, USBPcap or other similar software, and viewing the packets in Wireshark or similar - an approach which does not work for the virtual cables supported by TILP and TilEm/TIEmu, anyway.