Shoud tos.hyp move to UTF-8?

freemint / tos.hyp

The tos.hyp tries to document all functions from TOS. It also has information about MagiC, N.AES, MyAES, Geneva, XaAES, oAESis and some emulators.

https://freemint.github.io/tos.hyp

GNU General Public License v2.0

13 stars 11 forks source link

Shoud tos.hyp move to UTF-8? #95

Open mikrosk opened 5 years ago

mikrosk commented 5 years ago

When one tries to edit and/or fork+PR of a tos.hyp file, the following message can be observed:

We’ve detected the file encoding as windows-1252. When you commit changes we will transcode it to UTF-8.

Github wants to have all web-based edits in UTF-8, that applies also to websites (I had to re-encode whole https://github.com/mikrosk/ct60tos/tree/gh-pages in that way).

If we did that, we would need to re-encode the files back into Atari encoding before compiling the hypertext.

th-otto commented 5 years ago

Theoretically, there are several possibilities:

recode them to utf-8, and tell udo about the new input encoding. That should still produce STG files in atari encoding. However, i fear some tables are not formatted correctly then, since UDO does not work internally with utf-8, as Ulf Dunkel claims.
recode them to utf8, and output also utf8. My compiler can cope with that, and still produce binary files in atari encoding that can be viewed with ST-Guide (unless you use characters that can't be encoded in atarist encoding of course). But UDO might have same problems as above with tables.
use UDO's universal charset feature. There you use some ascii sequences for international characters, similar to TeX.

Actually, i would prefer to leave them in current encoding. It's github's fault if it cannot cope with different encodings, and it is only a matter if you try to edit it through the web interface. You can alway clone the repo and edit it locally instead.

mikrosk commented 5 years ago

Actually, i would prefer to leave them in current encoding. It's github's fault if it cannot cope with different encodings, and it is only a matter if you try to edit it through the web interface. You can alway clone the repo and edit it locally instead.

True. However I'm asking on behalf of other people who can read & spot errors but don't feel up to setting up git and branches etc. Github allows you to easily edit a file and create a PR out of it without any hassle. So that's my main motivation here.

th-otto commented 5 years ago

PS: the files are encoded in atari characterset, not cp1252.

PPS.: if you want you can try one or the other option with single files. The input encoding can be changed at any time. Just make sure you set it back at the end of the file, or before inclduing any other file.

PPPS.: unconditionally transcoding the files is a no-go. That will break any source where strings are encoded in the local platforms encoding. Beside that, as mentioned above, they are not encoded in cp1252.

mikrosk commented 5 years ago

I guess Github's encoding detection doesn't count with atari encodings. ;)

unconditionally transcoding the files is a no-go. That will break any source where strings are encoded in the local platforms encoding. Beside that, as mentioned above, they are not encoded in cp1252.

I guess it's their policy - you want to have a web content hosted by us, then it must be in UTF-8, period. I don't blame them but sure, it doesn't make our atari life easier.

th-otto commented 5 years ago

Edit: there would be another option: use UDO macros for all non-ascii characters, like is done https://github.com/freemint/tos.hyp/blob/82c4e0e81218e032de4bab4c324e5ce3388ba2b0/config.u#L114 here. That way, only a single file would have non-ascii characters.

th-otto commented 5 years ago

Its not only a matter of Atari. Any windows program that is not using wide unicode character strings is also affected.

th-otto commented 5 years ago

I've just changed a bunch of files to use macros for the non-ascii characters. But i encountered another problem: those characters also appear in verbatim environments (listings, examples etc.), and are not replaced there. So after conversion, you carefully have to check the new output, and that is quite a lot of work.