QB64-Phoenix-Edition / QB64pe

The QB64 Phoenix Edition Repository
https://qb64phoenix.com
Other
131 stars 26 forks source link

Add Unicode option to `_CLIPBOARD$` #323

Closed RhoSigma-QB64 closed 9 months ago

RhoSigma-QB64 commented 1 year ago

Add an optional parameter to _CLIPBOARD$ (sub and function) to determine the textformat to use for clipboard operations CF_TEXT/CF_UNICODETEXT.

eg.

a$ = _CLIPBOARD$ 'default, use CF_TEXT (current behavior), returning ansi text
a$ = _CLIPBOARD$(UNICODE) 'use CF_UNICODETEXT, returning UTF-16 text
a$ = _CLIPBOARD$(ANSI) 'same as default

Related discussion: https://qb64phoenix.com/forum/showthread.php?tid=1572

mkilgore commented 1 year ago

Perhaps instead of a separate setting we could always use CF_UNICODETEXT and convert it into UTF-8. I suspect that would match what the other platforms do already, and if the text is regular ASCII then the result is the same as right now (since ASCII is valid UTF-8).

RhoSigma-QB64 commented 1 year ago

In general that could work, but would pass the buck to us. What happens if there's a unicode on the clipboard which is not available in the currently set IDE codepage? We would need to handle that in any way to avoid further complaints.

If we make it an explicit parameter, then we can always say: "You requested the clipboard to operate with unicode, although QB64 isn't capable to handle that, so you're responsible to handle your data as needed in your application."

mkilgore commented 1 year ago

In general that could work, but would pass the buck to us. What happens if there's a unicode on the clipboard which is not available in the currently set IDE codepage? We would need to handle that in any way to avoid further complaints.

We wouldn't be doing any conversion, we would just leave the UTF-8 characters in the string as-is. That's already what happens on Linux (and very likely Mac OS) since they use UTF-8 for everything, so it's really not new behavior. In the Wiki we can just clarify that it's UTF-8 text that is given back (which is valid ASCII as long as no Unicode characters are present).

RhoSigma-QB64 commented 1 year ago

Sounds like a way to go, but wouldn't we break the IDE's internal copy'n'paste behavior then? I mean the language code itself is pure english and many coders will make their programs in english, even if it's not their native language.

I made some tests and see we can not even use extended ASCII chars in variable/function/type names, but we can use it in literal strings and comments. So what would happen, if e.g. Petr wants to copy'n'paste some of his czech worded code?

So for me it looks like we've to do conversion in any place. It's not a problem in the above scenario, as the copied chars are obviously available in the current codepage, but simply leave them as UTF-8 would screw up Petr's text as soon he pastes it, even if it's in the same program.

That's why I tend to leave the clipboard operations as-is and rather use an optional flag/parameter for unicode, if somebody really needs it and is prepared to deal with the unicode stuff himself.

mkilgore commented 1 year ago

I made some tests and see we can not even use extended ASCII chars in variable/function/type names, but we can use it in literal strings and comments. So what would happen, if e.g. Petr wants to copy'n'paste some of his czech worded code?

Yeah it's a good point, that's quite annoying :-/ I'll have to think about it for a bit.

If we do add separate ansi/unicode options for _CLIPBOARD$() then we'll have to consider how it all works cross-platform. Likely, the default will be different depending on the platform (Windows defaults to ANSI, Linux and Mac OS default to Unicode). For ANSI support for Linux and Mac OS we'll also need to consider if we're going to implement actual conversion of the Unicode characters (if a corresponding character exists), or maybe just strip them out.

Beyond that I think it's worth having the _CLIPBOARD$(UNICODE) option return UTF-8 on all platforms to make it easier to work with. Linux and Mac OS already return that, on Windows it's a simple conversion we could do when copying the data into the string (Windows already has functions to do it).

RhoSigma-QB64 commented 1 year ago

Beyond that I think it's worth having the _CLIPBOARD$(UNICODE) option return UTF-8 on all platforms to make it easier to work with. Linux and Mac OS already return that, on Windows it's a simple conversion we could do when copying the data into the string (Windows already has functions to do it).

No objections, my only concern was to generally change the behavior to unicode/UTF-8, we should definitly keep the current (ANSI) function for the IDE and other existing programs and make unicode optional. How the new unicode stuff is finally handled can still be determined when finally working on the implementation.

RhoSigma-QB64 commented 9 months ago

Obviously too many uncertainties here, hence closed as "Not planned".