free-audio / clap-wrapper

Wrappers for using CLAP in other plugin environments
MIT License
114 stars 18 forks source link

Windows+VST3 shows garpled parameter names and values with special characters. #305

Open lewloiwc opened 2 weeks ago

lewloiwc commented 2 weeks ago

The "°" character in "180°" and similar expressions becomes garbled. However, this might be my own issue and may not be relevant to others. I'm not sure where exactly the problem lies.

I know how to solve it in my environment, but I lack knowledge about character encoding related to differences in OS and compilers. There's probably an issue somewhere, and I'm not sure if this is the correct approach, but this resolved the character garbling for me. I confirmed that the garbled characters were fixed on Windows 10 & (Ableton Live 12 | Bitwig Studio 5.2.3 | Cubase 13 | FL Studio 2024 | REAPER 7.22 | Studio One 6).

These are using the rewritten version → https://chatgpt.com/share/cb8268e7-cb34-48f4-9783-76e8b6e6d89a

Fixing garbled parameter names

https://github.com/free-audio/clap-wrapper/blob/48861ecf472fe2cbcd67478e3f5f0dabe9e5785c/src/detail/vst3/parameter.cpp#L95 Change this to:

{
  const char *utf8_text = fullname.c_str();
  uint32_t out_buffer_capacity = 128;
  uint32_t out_index = 0;
  for (uint32_t i = 0;out_index < out_buffer_capacity - 1 && utf8_text[i] != 0;) {
    uint32_t codepoint = 0;
    uint8_t byte = static_cast<uint8_t>(utf8_text[i]);

    if ((byte & 0b10000000) == 0b00000000) {
      codepoint = byte;
      i += 1;
    } else if ((byte & 0b11100000) == 0b11000000) {
      codepoint = byte & 0b00011111;
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(utf8_text[i + 1]) & 0b00111111);
      i += 2;
    } else if ((byte & 0b11110000) == 0b11100000) {
      codepoint = byte & 0b00001111;
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(utf8_text[i + 1]) & 0b00111111);
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(utf8_text[i + 2]) & 0b00111111);
      i += 3;
    } else if ((byte & 0b11111000) == 0b11110000) {
      codepoint = byte & 0b00000111;
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(utf8_text[i + 1]) & 0b00111111);
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(utf8_text[i + 2]) & 0b00111111);
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(utf8_text[i + 3]) & 0b00111111);
      i += 4;
    }

    v.title[out_index] = codepoint;
    out_index++;
  }
  v.title[out_index] = 0;
}

Fixing garbled parameter values

https://github.com/free-audio/clap-wrapper/blob/48861ecf472fe2cbcd67478e3f5f0dabe9e5785c/src/wrapasvst3.cpp#L322-L324 Change this to:

{
  uint32_t out_buffer_capacity = 128;
  uint32_t out_index = 0;
  for (uint32_t i = 0;out_index < out_buffer_capacity - 1 && outbuf[i] != 0;) {
    uint32_t codepoint = 0;
    uint8_t byte = static_cast<uint8_t>(outbuf[i]);

    if ((byte & 0b10000000) == 0b00000000) {
      codepoint = byte;
      i += 1;
    } else if ((byte & 0b11100000) == 0b11000000) {
      codepoint = byte & 0b00011111;
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(outbuf[i + 1]) & 0b00111111);
      i += 2;
    } else if ((byte & 0b11110000) == 0b11100000) {
      codepoint = byte & 0b00001111;
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(outbuf[i + 1]) & 0b00111111);
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(outbuf[i + 2]) & 0b00111111);
      i += 3;
    } else if ((byte & 0b11111000) == 0b11110000) {
      codepoint = byte & 0b00000111;
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(outbuf[i + 1]) & 0b00111111);
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(outbuf[i + 2]) & 0b00111111);
      codepoint = (codepoint << 6) | (static_cast<uint8_t>(outbuf[i + 3]) & 0b00111111);
      i += 4;
    }

    string[out_index] = codepoint;
    out_index++;
  }
  string[out_index] = 0;
}

For verification, I used a modified CLAP based on clap-c99-distortion. https://github.com/baconpaul/clap-c99-distortion/blob/c0f7decafb651b521c75b6e25d7dbb6fa2470cd6/src/clap-c99-distortion.c#L128

        strncpy(param_info->name, "0°", CLAP_NAME_SIZE);

https://github.com/baconpaul/clap-c99-distortion/blob/c0f7decafb651b521c75b6e25d7dbb6fa2470cd6/src/clap-c99-distortion.c#L181-L189

        case 0:
            snprintf(display, size, "90°");
            break;
        case 1:
            snprintf(display, size, "180°");
            break;
        case 2:
            snprintf(display, size, "270°");
            break;
defiantnerd commented 2 weeks ago

Hi, thanks for bringing up this issue. But this "fix" is at the wrong place. Actually, the wrapper relies on the VST3SDK function str8tostr16 which is actually incorrect. We have to come up with a proper utf8tostr16 that converts utf8 to UCS-2.

We also have to come up with a solution what to do about characters that can not be encoded in UCS-2.

lewloiwc commented 2 weeks ago

Thank you for your reply!

As I expected, character encoding issues seem complex and difficult...

As a temporary measure, would it be okay for me to personally use the current code, limiting it to Windows and VST3? For example, is it possible that the code only works by chance in my environment, and could potentially lead to severe character corruption for other users, making things worse?

Is it safer to use str8str16 because the only garbled character I'm actually concerned about is the single character “°” and I'm not that concerned about it?

Here are the test results from my environment: Characters:

AÁÄÑあ汉가БيकΔ§±≠Æ™❹◒⣔⦸⌬⛄

image CLAP: image VST3: image VST3 with temporary fix: image

baconpaul commented 2 weeks ago

Why ucs2 timo? I thought vst3 was all utf-16 which seems to be what the function spliced in does.

baconpaul commented 2 weeks ago

But yeah I agree @defiantnerd it looks like the vst3 built doesnt do the right stuff. This probably also breaks on Mac and lin wheee st3 has to present utf16 no matter what

is the modified version of c99dist around somewhere for us to try @lewloiwc

we might be able to do same with std::codecvt btw.

defiantnerd commented 2 weeks ago

You're right, @baconpaul - it is UTF16. I was under the impression they limit it to UCS2 because all their code does that (or even less).

I am not sure if every host implements this correct and I would suggest not to name a parameter "⛄️".

Code conversion is a drama in C++ for it's own. 😜

I will add conversion functions for utf8/utf16.

lewloiwc commented 2 weeks ago

@baconpaul I have uploaded CLAP and VST3 to https://drive.google.com/drive/folders/1xpBE4lTPBW1y45sajz_3CQhlCaQiJUFN?usp=sharing (This VST3 is the 'thin' one.)

In the msvc folder, the third parameter name is "0°" and the items are "0°,90°,180°,270°". In the gcc folder, in addition to the changes in the msvc version, the second parameter name is "AÁÄÑあ汉가БيकΔ§±≠Æ™❹◒⣔⦸⌬⛄". The reason there are two versions is that the "⛄" version caused a C4819 error and couldn't be compiled, so I used gcc to compile it.

By the way, std::codecvt has been deprecated in C++17 and will be removed in C++26, so it's better not to use it. This page states: "There is no alternative Unicode character encoding conversion functionality in the standard library, so use other specialized character encoding conversion libraries." https://cpprefjp.github.io/reference/codecvt.html

@defiantnerd I used emojis for testing because I knew their character encoding was special, but as you said, it seems better not to use them as they probably can't be displayed on most hosts. Incidentally, REAPER was able to display it. 💀 image

baconpaul commented 2 weeks ago

Thanks. I was really hoping for the modified source, though. Do you have that? If not I can just add some unicode to the c99 list myself though now I think of it. Or actually to shortcircuit!

baconpaul commented 2 weeks ago

OK so the same happens on mac.

This is all assuming #289 is applied (@defiantnerd - I really think we should merge that even though it is only a partial fix).

Shortcircuit is a JUCE 8 app which lets you rename macros. That means I can name macros with full unicode text, including emojis. Those macro names show up as CLAP/VST3 param names.

Screenshot 2024-09-07 at 8 36 59 PM

Here/s the clap. The emoji comes through as a single un-renderable code point and the degree sign works.

Screenshot 2024-09-07 at 8 40 29 PM

Here's the wrapped VST (with #289 applied). The emoji comes through as a sequence of un-renderable characters and the degree sign is mangled.

My conclusion is

  1. Steinberg didn't give us a correct 16 to 8 charset converter
  2. the code point converter above looks right to me (in fact I used almost identical code for a UTF32-to-UTF8 thing in SFML I was doing just this week) and
  3. We should replace al our U.assigne with that converter and test in bitwig

I think that will fix the problem.

baconpaul commented 2 weeks ago

Screenshot 2024-09-07 at 8 46 50 PM

Oh and in REAPER with the CLAP we actually get this. Which means REAPER + ShortCircuit should be a pretty good test for the VST3 wrapper too

defiantnerd commented 2 weeks ago

I'd like to put that into proper util functions, add the caveats for the illegal codes and overflow, then we can add this to all other places we transforming u8 to u16.

The code, while thinking it is sufficient for now, is also using the ISO definition of UTF16, but that's probably okay.

baconpaul commented 2 weeks ago

Yeah of course we don’t want to inline that code everywhere. Excellent!

I’ll try the AU with scxt too