FieldList::addUtf8() does not work.

miczuc commented 3 months ago

Hello, I wonder, what the correct way is to transfere UTF8 strings from an "interactive provider" to a consumer, using EMA C++. For this I build an "interactive provider", by using your example from your directory Cpp-C/Ema/Examples/Training/IProvider/100_Series/100_MP_Streaming. I changed the main() function in IProvider.cpp slightly, by adding two fields with ids 3 and 28, as follows:

int main() { try { AppClient appClient;

    OmmProvider provider( OmmIProviderConfig().port( "14002" ), appClient );

    const char * test_utf8_str = "UTF-8-string: ´|§|ä|ö|ü|ß|Ä|Ö|Ü|€|¤|£|ø|`";
    int slen = strlen(test_utf8_str);

    EmaBuffer utf8Buf1(test_utf8_str, slen);

    char utf8_chars[128] = { 0x1B, 0x25, 0x30 };

    strcpy(utf8_chars + 3, test_utf8_str);

    EmaBuffer utf8Buf2(utf8_chars, slen + 3);

    while ( itemHandle == 0 ) sleep(1000);

    for ( Int32 i = 0; i < 60; i++ )
    {
        provider.submit( UpdateMsg().payload( FieldList().
                addReal( 22, 3391 + i, OmmReal::ExponentNeg2Enum ).
                addReal( 30, 10 + i, OmmReal::Exponent0Enum ).
                addUtf8(3, utf8Buf1).
                addRmtes(28, utf8Buf2).
                complete() ), itemHandle );;

        sleep( 1000 );
    }
}
catch ( const OmmException& excp )
{
    cout << excp << endl;
}

return 0;

}

For connecting to the interactive provider and displaying the received messages, I built the example consumer from your directory Cpp-C/Ema/Examples/Training/Consumer/100_Series/100_MP_Streaming .

This is the output for a received update message in the consumer:

UpdateMsg streamId="5" domain="MarketPrice Domain" updateTypeNum="0" name="IBM.N" serviceId="1" serviceName="DIRECT_FEED" Payload dataType="FieldList" FieldList FieldEntry fid="22" name="BID" dataType="Real" value="33.91" FieldEntry fid="30" name="BIDSIZE" dataType="Real" value="10" FieldEntry fid="3" name="DSPLY_NAME" dataType="Rmtes" value="UTF-8-string: Â_t|Â§|Ã_r|Ã¶|Ã¼|Ã�|Ã�|Ã�|Ã�|â�⅛|Â_r|Â£|Ã␇|" FieldEntry fid="28" name="NEWS" dataType="Rmtes" value="UTF-8-string: ´|§|ä|ö|ü|ß|Ä|Ö|Ü|€|¤|£|ø|" FieldListEnd

PayloadEnd

UpdateMsgEnd

If in the provider code the instruction "addUtf8(3, utf8Buf1)" is replaced with "addUtf8(3, utf8Buf2)", the output for the received update message becomes:

UpdateMsg streamId="5" domain="MarketPrice Domain" updateTypeNum="0" name="IBM.N" serviceId="1" serviceName="DIRECT_FEED" Payload dataType="FieldList" FieldList FieldEntry fid="22" name="BID" dataType="Real" value="33.91" FieldEntry fid="30" name="BIDSIZE" dataType="Real" value="10" FieldEntry fid="3" name="DSPLY_NAME" dataType="Rmtes" value="UTF-8-string: ´|§|ä|ö|ü|ß|Ä|Ö|Ü|€|¤|£|ø|" FieldEntry fid="28" name="NEWS" dataType="Rmtes" value="UTF-8-string: ´|§|ä|ö|ü|ß|Ä|Ö|Ü|€|¤|£|ø|" FieldListEnd

PayloadEnd

UpdateMsgEnd

So what is the difference between FieldList::addUtf8() and FieldList::addRmtes() ? Doesn't FieldList::addUtf8() free the programmer from prepending the marker bytes 0x1B, 0x25, 0x30 ???

MitchellKato commented 3 months ago

OMM has a defined UTF-8 string type, as well as RMTES. On the wire, these are all encoded as Buffer data, so the API will treat them as Buffers and encode the data as-is.

The API cannot automatically append the 0x1B, 0x25, 0x30 for RMTES FIDs encoded as UTF-8 because a large number of RMTES FIDs in the RDMFieldDictionary have very small RWF Cache lengths(under 5 bytes), which may get truncated in cache.

miczuc commented 3 months ago

My question is: What is the difference between FieldList::addRmtes() and FieldList::addUtf8() an when are these methods to use for what purpose, and how must the values be built, that are passed to these methods. Your comment does not answer anything.

umernalla commented 3 months ago

Hi @miczuc I cant answer the question of when to use addUtf8() as I have never used it, nor has any client that I have helped. However, the following article may be of benefit regards when to use addRmtes(): https://developers.lseg.com/en/article-catalog/article/encoding-and-decoding-non-ascii-text-using-ema-and-rfa-cnet

miczuc commented 3 months ago

So what is FieldList::addUtf8() good for ? What is the purpose of this method ? Why must the programmer deal with these bytes 0x1B,0x25,0x30 ? That is an internal oddity of your protocol and should be hidden in the API ... I dont understand that.

MitchellKato commented 3 months ago

There is a UTF-8 type defined in OMM. The addUtf8 method is intended to use that, and the expectation is that the UTF-8 type is just a UTF-8 string. Unfortunately, the UTF-8 type is not used in the standard RDM Field Dictionary.

The 0x1B, 0x25, 0x30 bytes are used for the RMTES type to indicate that the string is encoded as UTF-8. This is an escape sequence used for RMTES type data specifically, and is used for RMTES only.

It should also be noted that ETA and EMA are intentionally data-agnostic, so the API does not make overriding decisions on how data is encoded, and will just pass the data as-is on. It is up to the data sources and providers to handle this. If there are any issues with the data contents itself, you're going to have to reach out to LSEG support.

Refinitiv / Real-Time-SDK

FieldList::addUtf8() does not work. #283