beacon-biosignals / Ray.jl

Julia API for Ray
Other
9 stars 1 forks source link

Improve robustness of `ObjectRef` ownership info serialization #203

Closed omus closed 11 months ago

omus commented 11 months ago

While investigating #177 it was becoming evident that the flaw had something do with our worker addresses once again. As I so far have been unable to reproduce this problem locally I decided to look into audit our object reference ownership code to see if anything popped out. In doing so I recalled that in #140 we utilized JSON serialization for the owner address as we were having problems with CxxWrap StdString which would interpret the null-character as the end of the string instead of looking at the length of the string. The JSON approach doesn't actually fix anything except that it works around the issue. I decided to try to switch back to using SerializeAsString and apply our new CxxWrap knowledge to see if we could really fix the issue. In doing so I discovered that CxxWrap doesn't handle non-ASCII characters well:

julia> str = "¿\0ÿ"
"¿\0ÿ"

julia> collect(StdString(str, ncodeunits(str)))
5-element Vector{Char}:
 'Â': Unicode U+00C2 (category Lu: Letter, uppercase)
 '¿': Unicode U+00BF (category Po: Punctuation, other)
 '\0': ASCII/Unicode U+0000 (category Cc: Other, control)
 'Ã': Unicode U+00C3 (category Lu: Letter, uppercase)
 '¿': Unicode U+00BF (category Po: Punctuation, other)

julia> std_str = StdString(str, ncodeunits(str))
"¿"

julia> collect(String(std_str))
1-element Vector{Char}:
 '¿': Unicode U+00BF (category Po: Punctuation, other)

julia> collect(String(collect(std_str)))
5-element Vector{Char}:
 'Â': Unicode U+00C2 (category Lu: Letter, uppercase)
 '¿': Unicode U+00BF (category Po: Punctuation, other)
 '\0': ASCII/Unicode U+0000 (category Cc: Other, control)
 'Ã': Unicode U+00C3 (category Lu: Letter, uppercase)
 '¿': Unicode U+00BF (category Po: Punctuation, other)

julia> collect(String(Vector{UInt8}(collect(std_str))))
3-element Vector{Char}:
 '¿': Unicode U+00BF (category Po: Punctuation, other)
 '\0': ASCII/Unicode U+0000 (category Cc: Other, control)
 'ÿ': Unicode U+00FF (category Ll: Letter, lowercase)

The flaw in CxxWrap is that the iterate/isvalid implementation doesn't skip invalid code points (thanks @iamed2). I'll file an issue for that later today.

This PR implements a safe_convert function to ensure that C++ string data can be round-tripped as a Julia String and back. Doing this allows us to use SerializeAsString successfully. In iterating on this design I found that implementing serialization support for Address allowed us to clean up the interface for this resulting some cleaner code.

Fixes #177.

codecov[bot] commented 11 months ago

Codecov Report

Merging #203 (cb0539e) into main (204dff1) will increase coverage by 0.50%. Report is 2 commits behind head on main. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
+ Coverage   95.82%   96.33%   +0.50%     
==========================================
  Files          12       13       +1     
  Lines         623      627       +4     
==========================================
+ Hits          597      604       +7     
+ Misses         26       23       -3     
Flag Coverage Δ
Ray.jl 96.33% <100.00%> (+0.50%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/Ray.jl 100.00% <ø> (ø)
src/object_ref.jl 100.00% <100.00%> (+3.19%) :arrow_up:
src/ray_julia_jll/common.jl 90.99% <100.00%> (+1.51%) :arrow_up:
src/ray_julia_jll/ray_julia_jll.jl 100.00% <ø> (ø)
src/ray_julia_jll/upstream_fixes.jl 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more