epics-base / jca

Java Channel Access client API
https://www.javadoc.io/doc/org.epics/jca/latest/index.html
Other
8 stars 14 forks source link

handle UTF-strings as pvnames - fixed UDP-datagram buffer sizes #55

Closed TBirkeHZB closed 4 years ago

TBirkeHZB commented 4 years ago

This fixes a bug in handling UTF-8/16 PVnames.

Querying for a PV with german umlauts (due to a typo), all IOCs issued error-messages like

unterminated PV name in UDP search request?

and older ca-gateways even hung consuming 100%CPU and stop doing any I/O.

Test with pv-name "Führungsgröße":

13 bytes would have fit in the default UDP-datagram buffer for name-resolution requests (size = 16 bytes), but the 16 real bytes left no space for the terminating NULL-byte, which caused all kinds of errors.

With the proposed fix, JCA neither crashes any gateways nor produces any error messages on IOCs/Gateways. An existing PV named "Führungsgröße" now connects properly and returns values.

ralphlange commented 4 years ago

The alternative would be rejecting invalid PV names... What does the C library of Channel Access do in such cases?

TBirkeHZB commented 4 years ago

At least it doesn't miscalculate buffer sizes and hence crashes software ;-)

The C-version perfectly handles umlauts, since it doesn't know about UTF-whatever. It's just an array of bytes. Both creating PVs named "Käsebrötchen" and querying them works just fine.

Are UTF-8 characters forbidden? (can't find the proper source-code that quickly)

ralphlange commented 4 years ago

Tolerated by source code, not in the set of allowed characters by definition/documentation.

ralphlange commented 4 years ago

But your answer is fine. I would always argue in favor of making the two implementation act the same.

TBirkeHZB commented 4 years ago

Where is the definition of allowed characters in CA-PVnames? I just found the forbidden characters for EPICS-database-record-names.

ralphlange commented 4 years ago

Right, there is none. I rest my case.

kasemir commented 4 years ago

Looks like the issue only arises for longer names. This one where the string length of the PV name is 5, the byte length 7, runs without problems.

record(bi, "Größe")
{
  field(ZNAM, "mäßig")
  field(ONAM, "elefantös")
}

Works fine with 'caget' for both the basic PV name as well as enum labels:

$ caget -d CTRL_ENUM Größe
Größe
    Native data type: DBF_ENUM
    Request type:     DBR_CTRL_ENUM
    Element count:    1
    Value:            mäßig
..
    Enums:            ( 2)
                      [ 0] mäßig
                      [ 1] elefantös

.. as well as jca:

Screen Shot 2020-10-21 at 8 24 52 AM

TBirkeHZB commented 4 years ago

It seems only to be a problem when

name.length() < N * 8

and

name.getBytes().length >= N * 8

(assumption: UDP-datagram buffers are allocated in sizes that are multiples of 8 bytes) In this case, the NULL-byte is getting lost.

I could have lived with error-messages, but the gateways (still V2.1.0.0 here - to be updated soon) getting unresponsive is a real showstopper...

kasemir commented 4 years ago

Fixing the same issue in the PVA client