fefit / visdom

A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.
MIT License
110 stars 6 forks source link

doc.find("p:contains('好用')" panicked, only when Chinese characters appear in contains() #9

Closed simon639 closed 2 years ago

simon639 commented 2 years ago
use visdom::Vis;

fn main() -> anyhow::Result<()> {
    let html = r#"
    <div>
    <p>Visdom is awesome</p>
    <p>Visdom 很好用</p>
    </div>
    "#;
    let doc = Vis::load(html).unwrap();
    let elements = doc.find("p:contains('Visdom')");  // work
    println!("elements : {}", elements.length());

    let elements = doc.find("p:contains('好用')");  // panic
    println!("elements : {}", elements.length());
    println!("done");
    Ok(())
}

elements : 2
thread 'main' panicked at 'range end index 8 out of range for slice of length 5', library/core/src/slice/index.rs:73:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
   2: core::slice::index::slice_end_index_len_fail_rt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:73:5
   3: core::ops::function::FnOnce::call_once
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
   4: core::intrinsics::const_eval_select
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/intrinsics.rs:2372:5
   5: core::slice::index::slice_end_index_len_fail
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:67:9
   6: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:304:13
   7: <core::ops::range::RangeTo<usize> as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:351:9
   8: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/slice/index.rs:18:9
   9: <visdom::mesdoc::selector::pattern::RegExp as visdom::mesdoc::selector::pattern::Pattern>::matched
             at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/pattern.rs:357:17
  10: visdom::mesdoc::selector::pattern::exec
             at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/pattern.rs:445:26
  11: visdom::mesdoc::selector::rule::Rule::exec_queues
             at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/rule.rs:283:54
  12: visdom::mesdoc::selector::rule::Rule::exec
             at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/rule.rs:276:3
  13: visdom::mesdoc::selector::Selector::from_str
             at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/selector/mod.rs:157:51
  14: visdom::mesdoc::interface::elements::Elements::find
             at /home/simon/.cargo/registry/src/github.com-1ecc6299db9ec823/visdom-0.5.2/src/mesdoc/interface/elements.rs:1098:11
  15: repl::main
             at ./src/bin/repl.rs:17:20
  16: core::ops::function::FnOnce::call_once
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fefit commented 2 years ago

@simon639 Thx for the issue, here i used a mismatched chars length for the selector parameter & str length for the regex match result, so any non ASCII character, e.g. Chinese character used in the :contains selector will cause an error. I will fix it in the new version 0.5.3, thx for the feedback again. 感谢反馈!

simon639 commented 2 years ago

Thank you, you fix so quickly.