RustAudio / ogg

Ogg container decoder and encoder written in pure Rust
Other
113 stars 21 forks source link

UntilPageHeaderReader may capture a position where it's not actually a beginning of a page #15

Open TonalidadeHidrica opened 3 years ago

TonalidadeHidrica commented 3 years ago

Although extremely rare, the OggS magic pattern may occur in the places where a page does not start. Current UntilPageHeaderReader does not check for the integrity it captured, so it may capture a wrong page span and then return false HashMismatch error.

I have an idea of fixing this issue and currently working on it. It will require a large refactoring, and possibly cause breaking changes.

est31 commented 3 years ago

Do you have any real world data that contains OggS? Any real world use case?

Generally the point of a magic is that it's, well, magic and the sequence that allows you to identify the start of a stream.

est31 commented 3 years ago

Is your proposal to silently ignore hash mismatches and continue the search? What if there is a legitimate hash mismatch then? I think it'd be better to error in that instance than to silently omit a packet and mess up higher layers :).

TonalidadeHidrica commented 3 years ago

One of the most simple example is when the comment header contains a text "OggS". I agree that finding the pattern in audio packets is rare, probably almost never happen tho.

TonalidadeHidrica commented 3 years ago

My proposal in my brain is to silently skip it. If the skipped packet was within a valid packet position but the page simply contains an error, then we can detect it for the next time we captured a valid packet, since we will see a jump in "page sequence no".

TonalidadeHidrica commented 3 years ago

My concern is about seeking. When parsing pages sequentially, maybe we don't have to care about it so much. But when capturing a page right after seeking, because of the misdetection we may lose multiple pages covered by the first "false page". This may cause a problem when binary searching, and that's why I am worrying about it. Do you have any idea to handle them both? Or am I thinking too much?

est31 commented 3 years ago

One of the most simple example is when the comment header contains a text "OggS".

Hmm that's a good point. During normal operation it's not an issue I guess, mainly during seeking.

Do you have any idea to handle them both? Or am I thinking too much?

Nono, I think you are onto a good solution here :). It would be great if for the search of a page header you could specify whether you want pages with hash mismatches to be discarded or not, so that seeking ignores invalid hashes and normal operation when pages strictly follow each other would yield errors at a hash mismatch. I think such a change would be an improvement. Not sure if checking the sequence number is needed. PRs welcome!