epics-base / pvDataCPP

pvDataCPP is an EPICS V4 C++ module
https://epics-base.github.io/pvDataCPP/
Other
6 stars 16 forks source link

Invalid conversion to JSON for strings that contain double quotes #78

Closed sveseli closed 2 years ago

sveseli commented 2 years ago

The printJSON() method results in invalid JSON if strings contain double quote characters (see https://github.com/epics-base/pvaPy/issues/74).

mdavidsaver commented 2 years ago

Could you try changing

https://github.com/epics-base/pvDataCPP/blob/d3b4976ea2b0d78075511f14d7f7bf9620dd4d14/src/json/print.cpp#L80

to

A.strm<<'\"'<<pvd::escape(scalar->getAs<std::string>())<<'\"'; 

It may be necessary to include pv/typeCast.h.

anjohnson commented 2 years ago

Converting this code to use the yajl_gen API would probably be the simplest way to solve this, it is supposed to properly handle UTF-8 escapes.

sveseli commented 2 years ago

If I try pvd::escape(), here is what happens when I run tests that I added (see https://github.com/sveseli/pvDataCPP/commit/37d18cdcc548973c95e3712c01129e00373ad7d2):

bluegill2> ./testApp/O.linux-x86_64/testjson | tail -16
ok 29 - round2 ({"scalar": 42,"ivec": [1,2,3],"svec": ["one","two"],"sub": {"x": {"y": 43}},"extra": 0,"sarr": [{"a": 5,"b": 6},{"a": 7,"b": 8},{"a": 9,"b": 10}],"any": "4.2","almost": "hello"}) == "{\"scalar\": 42," "\"ivec\": [1,2,3]," "\"svec\": [\"one\",\"two\"]," "\"sub\": {" "\"x\": {" "\"y\": 43" "}}," "\"extra\": 0," "\"sarr\": [{\"a\": 5,\"b\": 6}," "{\"a\": 7,\"b\": 8}," "{\"a\": 9,\"b\": 10}]," "\"any\": \"4.2\"," "\"almost\": \"hello\"" "}" ({"scalar": 42,"ivec": [1,2,3],"svec": ["one","two"],"sub": {"x": {"y": 43}},"extra": 0,"sarr": [{"a": 5,"b": 6},{"a": 7,"b": 8},{"a": 9,"b": 10}],"any": "4.2","almost": "hello"})
testPrintJson: result={"foo": "bar"}; expected={"foo": "bar"}; comparison=1
ok 30 - compare == true
testPrintJson: result={"foo": "\"bar\""}; expected={"foo": "\"bar\""}; comparison=1
ok 31 - compare == true
testPrintJson: result={"foo": "\"long string with several \" characters\""}; expected={"foo": "\"long string with several \" characters\""}; comparison=1
ok 32 - compare == true
testPrintJson: result={"foo": "\"long string with several \" and \' characters\""}; expected={"foo": "\"long string with several \" and ' characters\""}; comparison=0
not ok 33 - compare == true

    Results
    =======
       Tests: 33 
      Passed:  32 = 96.97%
      Failed:   1 =  3.03%
     Skipped:   2 =  6.06%

If I try loading to json the string that fails the test, I get this:

>>> pv = PvObject({'foo': STRING}, {'foo': '"long string with several " and \' characters"'})
>>> s = pv.toJSON(True)
>>> s
'{\n "foo": "\\"long string with several \\" and \\\' characters\\""\n}'
>>> d = json.loads(s)
Traceback (most recent call last):
  File "", line 1, in 
  File "/local/sveseli/PVAPY/TEST3/CONDA-3.9/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/local/sveseli/PVAPY/TEST3/CONDA-3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/local/sveseli/PVAPY/TEST3/CONDA-3.9/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 2 column 44 (char 45)
>>> 

I wrote a simple function that just escapes a double quote (see https://github.com/sveseli/pvDataCPP/commit/b7e8833ee477a3f5a4ce59cae801b37f39144d9a), and which seems to work okay. If you'd like, I can make a pull request from my branch.

mdavidsaver commented 2 years ago

Converting this code to use the yajl_gen API ...

It doesn't look like this was/is present on 3.15, which has so far been supported.

I wrote a simple function that just escapes a double quote ...

Please test #79 which attempts to be a more complete solution.

anjohnson commented 2 years ago

It doesn't look like this was/is present on 3.15

yajl_gen.h has been present in Base since before R3.15.0.1. If the header doesn't #define EPICS_YAJL_VERSION though you're getting the older YAJL 1.0.12 API, which needs slightly different code to generate a yajl_gen handle, and didn't support long long int values.

On inspection your PR #79 looks like it should handle 7-bit ASCII characters properly (which is an improvement), but may generate mojibake for other characters depending on what isprint() returns (locale-dependent). Assuming the tests eventually pass (GHA seems to be having issues with this job) I'd be okay with our merging this for now.

mdavidsaver commented 2 years ago

yajl_gen.h has been present in Base since before R3.15.0.1. ...

My mistake. I replaced #79 with a quick attempt to use this, which will likely fail the 3.15 build. It also causes testjson to throw up a valgrind warning. I can't immediately see what is going on, but assume I mustn't be using the yajl api correctly...

Conditional jump or move depends on uninitialised value(s)
   at 0x483BC85: strlen (vg_replace_strmem.c:459)
   by 0x4A35D77: yajl_gen_string (yajl_gen.c:272)
   by 0x496596C: (anonymous namespace)::yg_string(yajl_gen_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (print.cpp:75)
   by 0x4965A7B: (anonymous namespace)::show_struct((anonymous namespace)::args&, epics::pvData::PVStructure const*, epics::pvData::BitSet const*) (print.cpp:93)
   by 0x4965FFA: (anonymous namespace)::show_field((anonymous namespace)::args&, epics::pvData::PVField const*, epics::pvData::BitSet const*) (print.cpp:163)
   by 0x49667FE: epics::pvData::printJSON(std::ostream&, epics::pvData::PVField const&, epics::pvData::JSONPrintOptions const&) (print.cpp:270)
   by 0x10EA73: printJSON (json.h:75)
   by 0x10EA73: (anonymous namespace)::testroundtrip() (testjson.cpp:230)
   by 0x10EEF6: main (testjson.cpp:285)

... but may generate mojibake ...

There is no "may". My first take would do so.

... I'd be okay with our merging this for now.

I'll fall back to the previous version if the yajl_gen usage issues take too long to sort out.

mdavidsaver commented 2 years ago

... I mustn't be using the yajl api correctly...

yajl_gen_indent_string doesn't copy the provided char*.

mdavidsaver commented 2 years ago

@sveseli Please re-test with #79

sveseli commented 2 years ago

79 seems to work okay for me.

mdavidsaver commented 2 years ago

79 is now merged, including another fix for Base 3.16. Also some GHA updates.

In retrospect, using yajl_gen.h wasn't worthwhile. If I had looked closer I would have noticed that it only emits single character escapes for values < 32 (aka space). I think this makes sense as the output is allowed to be utf-8, so there is no reason to escape anything other than ascii control characters. This would have been only a one-line change to my original (somewhat smaller) PR. Oh, well. I'm not motivated enough to revisit this now. I guess this serves as a reminder that I should always look more closely behind a new (to me) API before using it.

https://github.com/epics-base/epics-base/blob/d508962211b0fe2a838e3d492e43e405cd965b8e/modules/libcom/src/yajl/yajl_encode.c#L80