KonradHoeffner / hdt

Library for the Header Dictionary Triples (HDT) compression file format for RDF data.
https://crates.io/crates/hdt
MIT License
19 stars 4 forks source link

DictSectPFC bug with handling UTF8 #22

Closed KonradHoeffner closed 1 year ago

KonradHoeffner commented 1 year ago

DictSectPFC truncates at byte indexes but this can fail when these aren't character boundaries due to UTF8 potentially using multiple bytes for characters.

[...]
'byte index 29 is not a char boundary; it is inside 'Ö' (bytes 28..30) of `http://dbpedia.org/resource/Östergötland_County`', /home/konrad/projekte/rust/hdt/src/dict_sect_pfc.rs:175:49
stack backtrace:
   0: rust_begin_unwind
             at /rustc/b7bc90fea3b441234a84b49fdafeb75815eebbab/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/b7bc90fea3b441234a84b49fdafeb75815eebbab/library/core/src/panicking.rs:65:14
   2: core::str::slice_error_fail_rt
   3: core::str::slice_error_fail
             at /rustc/b7bc90fea3b441234a84b49fdafeb75815eebbab/library/core/src/str/mod.rs:86:9
   4: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeFrom<usize>>::index
             at /rustc/b7bc90fea3b441234a84b49fdafeb75815eebbab/library/core/src/str/traits.rs:370:21
   5: core::str::traits::<impl core::ops::index::Index<I> for str>::index
             at /rustc/b7bc90fea3b441234a84b49fdafeb75815eebbab/library/core/src/str/traits.rs:65:9
   6: <alloc::string::String as core::ops::index::Index<core::ops::range::RangeFrom<usize>>>::index
             at /rustc/b7bc90fea3b441234a84b49fdafeb75815eebbab/library/alloc/src/string.rs:2380:10
   7: hdt::dict_sect_pfc::DictSectPFC::locate_in_block
             at /home/konrad/projekte/rust/hdt/src/dict_sect_pfc.rs:175:49
   8: hdt::dict_sect_pfc::DictSectPFC::string_to_id
             at /home/konrad/projekte/rust/hdt/src/dict_sect_pfc.rs:113:23
   9: hdt::four_sect_dict::FourSectDict::string_to_id
             at /home/konrad/projekte/rust/hdt/src/four_sect_dict.rs:100:26