StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.29k stars 1.68k forks source link

[BugFix] Bump simdjson to 3.9.4 and Fix struct field columns inconsistent when loading from bad json #47775

Closed wyb closed 17 hours ago

wyb commented 2 days ago

Why I'm doing:

  1. struct field columns may be inconsistent when parsing partial field failed.
  2. find_field_unordered will crash in current simdjson version when loading from bad json.
    #0  std::__uniq_ptr_impl<simdjson::internal::dom_parser_implementation, std::default_delete<simdjson::internal::dom_parser_implementation> >::_M_ptr (this=0x8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173
    #1  std::unique_ptr<simdjson::internal::dom_parser_implementation, std::default_delete<simdjson::internal::dom_parser_implementation> >::get (this=0x8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:422
    #2  std::unique_ptr<simdjson::internal::dom_parser_implementation, std::default_delete<simdjson::internal::dom_parser_implementation> >::operator-> (this=0x8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:416
    #3  simdjson::fallback::ondemand::json_iterator::end_position (this=0x7fb36e049540) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/json_iterator-inl.h:193
    #4  simdjson::fallback::ondemand::json_iterator::skip_child (this=0x7fb36e049540, parent_depth=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/json_iterator-inl.h:126
    #5  simdjson::fallback::ondemand::value_iterator::skip_child (this=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/value_iterator-inl.h:693
    #6  simdjson::fallback::ondemand::value_iterator::find_field_unordered_raw (key=<error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, this=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/value_iterator-inl.h:306
    #7  simdjson::fallback::ondemand::object::find_field_unordered(std::basic_string_view<char, std::char_traits<char> >) & (key=<error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>, this=<optimized out>) at /home/disk3/sr-deps/thirdparty/installed/include/simdjson/generic/ondemand/object-inl.h:7
    #8  starrocks::add_struct_column (column=0x7fb376e46330, type_desc=..., name="k2", value=value@entry=0x7fb34e6f7090) at be/src/formats/json/struct_column.cpp:42
    #9  0x00000000075bdb0f in starrocks::add_adaptive_nullable_struct_column (column=0x7fb376e46380, type_desc=..., name="k2", value=0x7fb34e6f7090) at be/src/formats/json/nullable_column.cpp:255
    #10 starrocks::add_adpative_nullable_column (column=0x7fb376e46380, type_desc=..., name="k2", value=...) at be/src/formats/json/nullable_column.cpp:404
    #11 starrocks::add_adaptive_nullable_column (column=0x7fb376e46380, type_desc=..., name="k2", value=value@entry=0x7fb34e6f7090, invalid_as_null=true) at be/src/formats/json/nullable_column.cpp:456
    #12 0x00000000075812fa in starrocks::JsonReader::_construct_column (this=0x7fb376e5d000, value=..., column=0x7fb376e31b30, column@entry=0x7fb376e5d000, type_desc=..., col_name=<error reading variable: Cannot access memory at address 0x8>) at be/src/exec/json_scanner.cpp:812
    #13 starrocks::JsonReader::_construct_row_without_jsonpath (this=this@entry=0x7fb376e5d000, row=row@entry=0x7fb34e6f71c0, chunk=chunk@entry=0x7fb376e4d010) at be/src/exec/json_scanner.cpp:563
    #14 0x00000000075865d3 in starrocks::JsonReader::_construct_row (this=0x7fb376e5d000, row=0x7fb34e6f71c0, chunk=0x7fb376e4d010) at be/src/exec/json_scanner.cpp:656
    #15 starrocks::JsonReader::_read_rows<starrocks::JsonDocumentStreamParser> (this=this@entry=0x7fb376e5d000, chunk=chunk@entry=0x7fb376e4d010, rows_to_read=rows_to_read@entry=4096, rows_read=rows_read@entry=0x7fb34e6f72c4) at be/src/exec/json_scanner.cpp:459
    #16 0x000000000757f062 in starrocks::JsonReader::read_chunk (this=0x7fb376e5d000, chunk=0x7fb376e4d010, rows_to_read=4096) at be/src/exec/json_scanner.cpp:426

What I'm doing:

  1. bump simdjson to 3.9.4 to fix find_field_unordered crash.
  2. fill null if error to avoid inconsistent struct field columns.
  3. support big integer(<-9223372036854775808 and >18446744073709551615).

Fixes #issue https://github.com/StarRocks/StarRocksTest/issues/7982

45406

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

github-actions[bot] commented 18 hours ago

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] commented 17 hours ago

[BE Incremental Coverage Report]

:white_check_mark: pass : 11 / 13 (84.62%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/formats/json/numeric_column.cpp 9 11 81.82% [95, 96]
:large_blue_circle: be/src/formats/json/struct_column.cpp 2 2 100.00% []
github-actions[bot] commented 17 hours ago

@Mergifyio backport branch-3.3

github-actions[bot] commented 17 hours ago

@Mergifyio backport branch-3.2

mergify[bot] commented 17 hours ago

backport branch-3.3

✅ Backports have been created

* [#47894 [BugFix] Bump simdjson to 3.9.4 and Fix struct field columns inconsistent when loading from bad json (backport #47775)](https://github.com/StarRocks/starrocks/pull/47894) has been created for branch `branch-3.3`
mergify[bot] commented 17 hours ago

backport branch-3.2

✅ Backports have been created

* [#47895 [BugFix] Bump simdjson to 3.9.4 and Fix struct field columns inconsistent when loading from bad json (backport #47775)](https://github.com/StarRocks/starrocks/pull/47895) has been created for branch `branch-3.2`