RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
544 stars 34 forks source link

Panic with out of bounds in USE #65

Closed laurmaedje closed 1 year ago

laurmaedje commented 1 year ago

When shaping the text below with Noto Sans Meetei Mayek Regular, rustybuzz panics with an out of bounds error in the universal shaping engine in line 370. It's possible to get rid of the panic by changing the comparison in line 478 from != to <. The problem is that p skips directly beyond pe so the break doesn't trigger. I tried to look what Harfbuzz does there, but it's quite hard to follow, so not sure whether this would be the right fix.

The code below reproduces the problem:

const BYTES: &[u8] = include_bytes!("../NotoSansMeeteiMayek-Regular.ttf");
const TEXT: &str = "ꫦꫧꫨꫩꫪꫫ\u{aaec}\u{aaed}ꫮꫯ꫰꫱ꫲꫳꫴꫵ\u{aaf6}";

fn main() {
    let face = rustybuzz::Face::from_slice(BYTES, 0).unwrap();
    let mut buffer = rustybuzz::UnicodeBuffer::new();
    buffer.push_str(TEXT);
    rustybuzz::shape(&face, &[], buffer);
}

(The text simply consists of all codepoints in U+AAE6 ..= U+AAF6.)

RazrFalcon commented 1 year ago

This code is autogenerated in harfbuzz, aka a state machine. Will see what we can do.

thynson commented 1 year ago

If the Rust codes were replicated exactly from the C codes, I guess that out-of-bound read also happens in harfbuzz 2.7.1, in https://github.com/harfbuzz/harfbuzz/blob/2.7.1/src/hb-ot-shape-complex-use-machine.hh#L432, info[p].use_category() is expanded to info[p].var2.u8[2](use_category is a macro), without any pointer being dereferenced, so it may not cause any serious bug yet and nobody noticed it. Note it was rewritten by PR https://github.com/harfbuzz/harfbuzz/pull/1695 in harfbuzz 7.0.1 and backported to 2.7.3.

behdad commented 1 year ago

When shaping the text below with Noto Sans Meetei Mayek Regular, rustybuzz panics with an out of bounds error in the universal shaping engine in line 370. It's possible to get rid of the panic by changing the comparison in line 478 from != to <. The problem is that p skips directly beyond pe so the break doesn't trigger. I tried to look what Harfbuzz does there, but it's quite hard to follow, so not sure whether this would be the right fix.

This sounds like https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/ :)

RazrFalcon commented 1 year ago

@behdad *_machine code is the worst. I had to port autogenerated C code with tons of goto's to Rust. No surprises here. Luckily, hb test suite is extensive enough and there were nearly zero bugreports in the past two years.

RazrFalcon commented 1 year ago

@thynson Thanks. That's interesting. There are no pointer manipulations in rustybuzz, so any hidden hb/C bugs/UB will become visible.

And rustybuzz is still at 2.7.1 I have no time lately to sync it up.

Enivex commented 1 year ago

A simpler example here is anything involving the Balinese character ᭄ (U+1B44). Rustybuzz panics in the same line when shaping with Noto Serif Balinese.

notgull commented 1 year ago

This may be fixed by #77