SWI-Prolog / swipl-devel

SWI-Prolog Main development repository
http://www.swi-prolog.org
Other
955 stars 172 forks source link

Feature request UTF-32 in SWI and XPCE #811

Open ghost opened 3 years ago

ghost commented 3 years ago

After I saw some test cases by Paulo Moura, I am deliberating a future UTF-32 support. I just made a little check with his mythology file, and I get letter spacing in SWI-Prolog navigator:

image

On the other hand the IntellJ iDE can do it:

Unbenannt

Some way to make this work in SWI and XPCE?

Test case is from here: https://\github.com/LogtalkDotOrg/logtalk3/tree/master/examples/encodings

christianhujer commented 8 months ago

I was, just for fun, implementing recode in various programming languages. I came across this limitation when implementing recode in Prolog. I basically found the following limitations:

JanWielemaker commented 8 months ago
  • Prolog cannot produce a list of supported encodings

The set of encodings for SWI-Prolog is fixed to a set of widely used encodings and the default locale encoding. Of course we could add a predicate to return this set, but it won't add much.

  • SWI-Prolog does not support UTF-16 with automatic BOM detection.

? By default, streams opened in text mode do BOM detection, setting up UTF-8, UTF-16BE or UTF-16LE when detected.

  • SWI-Prolog does not support UTF-32.

True. I see little reason to add it as it seems very rare as a wire protocol. It supports wchar for memory based streams that are represented as native wchar_t []. If there is need for UTF-32 it can of course be added. It is not a big deal :smile:

  • SWI-Prolog seems to not have a mechanism for a user program to "register" additional encodings.

True. If you want, you can as SWI-Prolog supports a notion of filter streams. These are also used to implement e.g. compression, TLS sockets, HTTP chunked encodings, etc. They can even be implemented in Prolog, but typically you want to implement these as C plugins. That would allow linking to the GNU recode library and use any supported encoding thereof. I think that is the correct approach as I am not keen in adding dependency of these big libraries to the core. It would certainly be nice to see an extension pack that implements this.

To the user, this means opening the source or destination as binary stream and then create a recode stream from the binary stream. The recode stream can be programmed to close the original stream when closed, so the whole thing is transparent to the user.