damus-io / damus

iOS nostr client
GNU General Public License v3.0
1.99k stars 287 forks source link

Names for Japanese users w/ Japanese characters not rendering #1668

Closed alltheseas closed 10 months ago

alltheseas commented 10 months ago

Maybe this is UTF-8 issue

from @jb55 feedback

It looks like there is no username displaying.

cc @danieldaquino

alltheseas commented 10 months ago

This fix to follow Japanese relay ticket

alltheseas commented 10 months ago

This precedes push notifications @danieldaquino

danieldaquino commented 10 months ago

Starting to work on this now 🚀

danieldaquino commented 10 months ago

@jb55, unfortunately I did not have a chance to fully fix this today, but I narrowed down the problem. I believe the issue is in either in the ingester thread or writer thread in NostrDB. Here's why:

  1. When I change the profile name to "ひらがな" in settings, the JSON that gets sent over was created correctly (The hiragana shows up)
  2. The raw JSON profile event received by the ingester thread also contains the Hiragana characters.
  3. Ingester thread prepares a message to the writer thread. It has hard to check with the debugger whether or not it got lost at that point.
  4. At the point where the writer thread calls ndb_write_profile_search_indices, the display_name is already NULL.
alltheseas commented 10 months ago

Example profile

npub1avgeydxyv7kf6tl75kmjsne6wj7sg2r6zt8atz3z6xtzvs6vmheqssyucl

Damus

image

In nostr.band

image

danieldaquino commented 10 months ago

Still investigating this, and learning more about NostrDB in the process.

The issue seems to be happening on the ingester side (Unless I am doing something wrong).

By the time the function ndb_ingester_process_event calls ndb_ingester_process_note, the note content (which should contain profile data including the display name) seems to have lost info.

(lldb) print note.content
(ndb_packed_str) {
  packed = (str = "", flag = '\0')
  offset = 0
  bytes = ""
}

The outer event JSON seems to be parsing fine at ndb_json_parser_parse under ndb_client_event_from_json. I will look into what ndb_parse_json_note is doing.

danieldaquino commented 10 months ago

Ok, the above comment might not be true. I tried a counter-example with a western display name "abc" and got similar results. note.content seems to be this empty ndb_packed_str after ndb_parse_json_note.

It seems that the content of the parsed note does not get stored into note.content, but instead insider its builder.

When the program pointer is at line 2772:

                      else if (jsoneq(json, tok, tok_len, "content")) {
                // content
                tok = &parser->toks[i+1];
                union ndb_packed_str pstr;
                tok_len = toksize(tok);
                int written, pack_ids = 0;
                if (!ndb_builder_make_json_str(&parser->builder,
                            json + tok->start,
                            tok_len, &pstr,
                            &written, pack_ids)) {
                    ndb_debug("ndb_builder_make_json_str failed\n");
                    return 0;
                }
                parser->builder.note->content_length = written;
------->            parser->builder.note->content = pstr;
                parsed |= NDB_PARSED_CONTENT;
            }

It seems that the content portion of the note gets written to parser->builder.strings.start

(lldb) print parser->builder.strings.start
(unsigned char *) 0x000000010902d9e0 "{\"website\":\"\",\"about\":\"\",\"name\":\"\",\"display_name\":\"abc\",\"lud06\":\"\"}"

I get the same results if I try applying a UTF-8 name:

(lldb) print parser->builder.strings
(cursor)  (start = "{\"display_name\":\"ひらがな\",\"about\":\"\",\"website\":\"\",\"lud06\":\"\",\"name\":\"\"}", p = "", end = "")

I will have to look further down the process again.

danieldaquino commented 10 months ago

I believe the issue is in either in the ingester thread or writer thread in NostrDB. Here's why: (...)

  1. At the point where the writer thread calls ndb_write_profile_search_indices, the display_name is already NULL.

I am still confident about this though. I tested a counter-example where the display name is "abc", and on the same situation as item (4) above, the display_name gets set to abc as expected (instead of NULL as seen in the "ひらがな" example)

danieldaquino commented 10 months ago

Ok, the above comment might not be true.

I confirm that that was definitely not true.

The content seems to be parsed. I was just looking at note.content, when the JSON content I was looking for was actually being written to a buffer.

    note_size =
        ev->client ? 
        ndb_client_event_from_json(ev->json, ev->len, &fce, buf, bufsize, &cb) :
        ndb_ws_event_from_json(ev->json, ev->len, &tce, buf, bufsize, &cb);

--> if (note_size == -42) {
        // we already have this!
(lldb) x -s 1 -f c -c 4096 buf --force
(...)
0x109bbf3c0: \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
0x109bbf3e0: {"about":"","website":"","displa
0x109bbf400: y_name":"\xe3\x81\xb2\xe3\x82\x89\xe3\x81\x8c\xe3\x81\xaa","name":""
0x109bbf420: ,"lud06":""}\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
(...)

That parsing function seems to be fine. I will zoom back out further down the pipeline

danieldaquino commented 10 months ago

The next step I am working is to check ndb_process_profile_note and ndbprofile_parse_json (which happens somewhere in the middle of two known debugging points), and compare the behavior between the example in Hiragana and English example.

static int ndbprofile_parse_json(flatcc_builder_t *B,
        const char *buf, size_t bufsiz, int flags, NdbProfile_ref_t *profile)
{
    flatcc_json_parser_t parser, *ctx = &parser;
    flatcc_json_parser_init(ctx, B, buf, buf + bufsiz, flags);

    if (flatcc_builder_start_buffer(B, 0, 0, 0))
        return 0;

    NdbProfile_parse_json_table(ctx, buf, buf + bufsiz, profile);
    if (ctx->error)
        return 0;

    if (!flatcc_builder_end_buffer(B, *profile))
        return 0;

    ctx->end_loc = buf;

--->return 1;
}

I am having a hard time finding which buffer or variable to check. B->buffers[N] (where N can be 0, 1, 2, etc) is my closest bet, they seem to be pointing to address on-disk. Looking into that.

danieldaquino commented 10 months ago

@jb55, do you have any thoughts on which variables or addresses to look at to check this point in the process (ndb_process_profile_note)? I am trying to find out if the display name is being properly written to the profile flatbuffer on both "abc" and "ひらがな" display name examples, or only on the "abc" example.

Or any other thoughts overall on this ticket?