Open ghost opened 3 years ago
I was, just for fun, implementing recode
in various programming languages. I came across this limitation when implementing recode
in Prolog. I basically found the following limitations:
java.nio.charset.Charset.availableCharsets()
)UTF-16
with automatic BOM detection. One has to explicitly mention whether it should be UTF-16BE
or UTF-16LE
.UTF-32
.java.nio.charset.CharsetProvider
)
- Prolog cannot produce a list of supported encodings
The set of encodings for SWI-Prolog is fixed to a set of widely used encodings and the default locale encoding. Of course we could add a predicate to return this set, but it won't add much.
- SWI-Prolog does not support
UTF-16
with automatic BOM detection.
? By default, streams opened in text
mode do BOM detection, setting up UTF-8, UTF-16BE or UTF-16LE when detected.
- SWI-Prolog does not support
UTF-32
.
True. I see little reason to add it as it seems very rare as a wire protocol. It supports wchar
for memory based streams that are represented as native wchar_t []
. If there is need for UTF-32 it can of course be added. It is not a big deal :smile:
- SWI-Prolog seems to not have a mechanism for a user program to "register" additional encodings.
True. If you want, you can as SWI-Prolog supports a notion of filter streams. These are also used to implement e.g. compression, TLS sockets, HTTP chunked encodings, etc. They can even be implemented in Prolog, but typically you want to implement these as C plugins. That would allow linking to the GNU recode library and use any supported encoding thereof. I think that is the correct approach as I am not keen in adding dependency of these big libraries to the core. It would certainly be nice to see an extension pack that implements this.
To the user, this means opening the source or destination as binary stream and then create a recode stream from the binary stream. The recode stream can be programmed to close the original stream when closed, so the whole thing is transparent to the user.
After I saw some test cases by Paulo Moura, I am deliberating a future UTF-32 support. I just made a little check with his mythology file, and I get letter spacing in SWI-Prolog navigator:
On the other hand the IntellJ iDE can do it:
Some way to make this work in SWI and XPCE?
Test case is from here: https://\github.com/LogtalkDotOrg/logtalk3/tree/master/examples/encodings