dworkin / dgd

Dworkin's Game Driver, an object-oriented database management system originally used to run MUDs.
https://www.dworkin.nl/dgd/
GNU Affero General Public License v3.0
103 stars 31 forks source link

The entire text from non-ascii characters is half as long #12

Closed Muderru closed 6 years ago

Muderru commented 6 years ago

dgd

shentino commented 6 years ago

Interesting!

I think this is the first time I've seen a DGD mud or any mud in general using non ASCII characters. Neat!

Anyway, my best guess is that if you want the client to word wrap properly you need to insert newlines at the proper places. For ASCII this is as simple as counting characters, but if you're working with international symbols and the like, you may need to implement something along the lines of http://man7.org/linux/man-pages/man3/wcswidth.3.html if you want to properly calibrate word wrapping.

On Sun, Sep 2, 2018 at 5:43 AM Игорь notifications@github.com wrote:

[image: dgd] https://user-images.githubusercontent.com/7998756/44956078-36768380-aecf-11e8-96b3-ef26e393d645.jpg

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dworkin/dgd/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AADDuURymyRKGp5-i2O4192E_oe8gl07ks5uW9J4gaJpZM4WWmIV .

bodrich commented 6 years ago

type string has attribute length?

quixadhal commented 6 years ago

Most MUD’s don’t have any issues with accepting or displaying arbitrary character sets. I even have an I3 channel for url processing so people can see when I post fun/annoying Japanese idol music videos. 😊

But there are often issues with processing non-ASCII text. In this case, I’d first check that the apparent spaces are actually the ASCII space character (32), and not some extended version of it. UTF-8 has multiple characters that can be used as separators, but odds are good that the mudlib is only checking for a few when trying to find a wrap point.

If the word wrapping is done via regexp, the whitespace token itself may or may not be character set aware, so the whitespace token (\s) might not see extended versions as whitespace. And of course the word boundary token (\b) relies on whitespace too.

So, it’s not just about the number of bytes, it’s about what those bytes represent. If you’re trying to break on words, you need to be able to see the word boundaries.

Sent from Mail for Windows 10

From: shentino Sent: Sunday, September 2, 2018 8:55 AM To: dworkin/dgd Cc: Subscribed Subject: Re: [dworkin/dgd] The entire text from non-ascii characters is halfas long (#12)

Interesting!

I think this is the first time I've seen a DGD mud or any mud in general using non ASCII characters. Neat!

Anyway, my best guess is that if you want the client to word wrap properly you need to insert newlines at the proper places. For ASCII this is as simple as counting characters, but if you're working with international symbols and the like, you may need to implement something along the lines of http://man7.org/linux/man-pages/man3/wcswidth.3.html if you want to properly calibrate word wrapping.

On Sun, Sep 2, 2018 at 5:43 AM Игорь notifications@github.com wrote:

[image: dgd] https://user-images.githubusercontent.com/7998756/44956078-36768380-aecf-11e8-96b3-ef26e393d645.jpg

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dworkin/dgd/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AADDuURymyRKGp5-i2O4192E_oe8gl07ks5uW9J4gaJpZM4WWmIV .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

dworkin commented 6 years ago

This is a mudlib issue.

DGD strings are binary strings, or if you like, arrays of 8-bit bytes. DGD doesn't know or care which character set is used for input or output. In the past, Russian DGD muds used KOI8-R, an 8-bit encoding. Obviously UTF-8 is a better solution, but you do need mudlib support, which your mudlib does not seem to have.

The right solution is to create a new LPC type String which wraps the low-level LPC string. For an example of how to do that, see String.c in the Cloud lib. The Cloud lib is not for muds, but it's in the public domain so any part of it can be used for other projects without restriction.

The String type in the cloud library is not yet complete. It doesn't know about proper UTF-8 capitalisation and comparison. But it's good enough to get you started.

bodrich commented 6 years ago

Monkey patch from prool, file lib/sys/obj/user.c

void wrap_message(string str, varargs int chat_flag) {
    string msg, *words, *lines;
    int width, i, j, sz;

    if (!str || str == "") {
        return;
    }

    width = -1;
    /* Get the width from the player */
    if (player) {
        catch (width = player->query_width());
    }

    rlimits(MAX_DEPTH; MAX_TICKS) {
        /* Split the string into lines */
        lines = explode(str, "\n");

        /* Parse each line */
        for (j = 0; j < sizeof(lines); j++) {
            str = lines[j];
            msg = str;
            if (0/*strlen(ansid->strip_colors(str)) > width*/) { /* prool fool */
                int adding;
                string word_todo;

                sz = 0;
                words = explode(str, " ");
                msg = "";

                for (i = 0; i < sizeof(words); i++) {
                    word_todo = nil;
                    if (strlen(words[i]) > 4 && (strstr(words[i], "%^") != -1)) {
                        word_todo = ansid->strip_colors(words[i]);
                    }
                    /* word_todo is the word stripped from ansi codes */
                    if (!word_todo) {
                        word_todo = words[i];
                    }

                    if (0/*sz + strlen(word_todo) + adding > width*/) {/* prool fool */
                        msg += "\n";

                        if (chat_flag) {
                            msg += " ";
                        }

                        /* add length of word without ansi codes */
                        sz = strlen(word_todo) + 2;

                        /* add word with ansi codes */
                        msg += words[i];
                    }
                    else {
                        if (adding) {
                            msg += " " + words[i];
                        }
                        else {
                            msg += words[i];
                        }
                        sz += strlen(word_todo) + adding;
                    }
                    /* determine how many spaces will be added next run */
                    if (sz == 0) {
                        adding = 0;
                    }
                    else {
                        adding = 1;
                    }
                }
            }
            if (query_player()->query_ansi()) {
                msg = ansid->parse_colors(msg);
            }
            else {
                msg = ansid->strip_colors(msg);
            }

            send_message(msg + "\n");
        }
    }
}