dhowden / tag

ID3, MP4 and OGG/FLAC metadata parsing in Go
BSD 2-Clause "Simplified" License
568 stars 77 forks source link

Ogg Vorbis files where comments span multiple pages cause issues #56

Closed pgalbavy closed 4 years ago

pgalbavy commented 4 years ago

I have Ogg files which include encoded pictures. These are large (>64k) and the code misreads the lengths of the next part and end up falling off the EOF. Looking at building a simple patch, but not yet had time.

The cmd/tag/main.so reports , for example:

$ go run main.go ~/tmp/09\ -\ The\ Boxer.ogg
error reading file: unexpected EOF
dhowden commented 4 years ago

Thanks for the report.

Can you post the file somewhere (or at least the part containing the metadata), then we can take a quick look. Might be a simple fix.

pgalbavy commented 4 years ago

I actually just used dbpoweramp to covert a FLAC with a cover image in it. If you can't create one easily that fails I can upload a copy somewhere but don't want to post Copyrighted material publicly.

-- Peter

On Tue, 19 Nov 2019, 21:15 David Howden, notifications@github.com wrote:

Thanks for the report.

Can you post the file somewhere (or at least the part containing the metadata), then we can take a quick look. Might be a simple fix.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dhowden/tag/issues/56?email_source=notifications&email_token=AAOAVZWR4FDW3HB4GPM4CWDQURJORA5CNFSM4JPJOX2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEPZMKI#issuecomment-555718185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOAVZWLIOZM5X33IMFPJWTQURJORANCNFSM4JPJOX2A .

dhowden commented 4 years ago

If you can just send the first 150kb of the file, should be enough.

$ head -c 150K file.ogg > sample.data

(see https://linux.die.net/man/1/head for more details).

pgalbavy commented 4 years ago

See attached (ZIP-ed beacuse of github).

09 - The Boxer-trimmed.zip

I will be looking at this later, but not yet good enough to golang to create an elegant fix to do paged reads.

w1ck3dg0ph3r commented 4 years ago

It seems the file / tag encoder is to blame, not this library: METADATA_BLOCK_PICTURE comment has specified length 125003, but the actual length of the comment is 125284. Fixing comment's length in the file fixes the problem.

w1ck3dg0ph3r commented 4 years ago

Nevertheless, there could be a problem with reading comments length as signed int32, as the docs says it should be unsigned.

https://github.com/dhowden/tag/blob/db0c67e351b1bfbdfc4f99c911e8afd0ca67de98/vorbis.go#L49 https://github.com/dhowden/tag/blob/db0c67e351b1bfbdfc4f99c911e8afd0ca67de98/vorbis.go#L43

pgalbavy commented 4 years ago

If you dump the base64 that is the picture you will see an "OggS" block and 0xff in the sequence, at about 64k in. The difference in sizes is probably the page header. I can re-insert the debug code I hacked into vorbis.so to read the base64 (and error out) if it helps.

pgalbavy commented 4 years ago

I would note that the same file is decoded and tags displayed perfectly ok in both mediainfo on the command line and dbPoweramp from a Windows desktop.

dhowden commented 4 years ago

Nevertheless, there could be a problem with reading comments length as signed int32, as the docs says it should be unsigned.

https://github.com/dhowden/tag/blob/db0c67e351b1bfbdfc4f99c911e8afd0ca67de98/vorbis.go#L49

https://github.com/dhowden/tag/blob/db0c67e351b1bfbdfc4f99c911e8afd0ca67de98/vorbis.go#L43

Good spot! The decoding being called is actually LittleEndian.Uint32 (and then casting to signed), but this is not at all clear in the code. Will tidy this up. Don't expect it to fix this issue though.

w1ck3dg0ph3r commented 4 years ago

and then casting to signed

Well, that was my concern: wouldn't casting uint32 that is greater than 2^31 to int32 (and int on 32-bit platform) produce negative length and skip the loop entirely / panic trying to allocate negative-length slice in readBytes?

w1ck3dg0ph3r commented 4 years ago

@pgalbavy Yep, you are right, I jumped in too soon after only glancing over the spec. Fixed the multipage comment packets, will do a PR later after some cleanup, if @dhowden doesn't mind.

dhowden commented 4 years ago

and then casting to signed

Well, that was my concern: wouldn't casting uint32 that is greater than 2^31 to int32 (and int on 32-bit platform) produce negative length and skip the loop entirely / panic trying to allocate negative-length slice in readBytes?

Yes, absolutely correct. The patch I created was largely to avoid more confusion reading the code as it flips between uint32 <-> int32 unnecessarily. We still have the issue on 32-bit machines due to the int param in readBytes (which is used everywhere, along with the other readX helper functions, and so I did not want to update all those and have a much bigger change with a lot more casting).

I argued with myself a bit and decided it was OK for now (in the short term at least - it was past midnight at the time!). In practice having a valid file with > 2G metadata, or > 2 billion comments is very unlikely (not to mention unwieldy). Incorrectly encoded files would likely error/panic elsewhere.

@pgalbavy Yep, you are right, I jumped in too soon after only glancing over the spec. Fixed the multipage comment packets, will do a PR later after some cleanup, if @dhowden doesn't mind.

This looks great, thanks!

pgalbavy commented 4 years ago

On the uint / int subject - there are a lot of uses of int in places that I think can only be unsigned, for example all the readBytes/readString lengths. While Seek can be negative, it makes little sense to allow signed values for sizes. Happy to work through them if it's useful, but there are a lot of trees of calls that would need changing.

w1ck3dg0ph3r commented 4 years ago

In practice having a valid file with > 2G metadata, or > 2 billion comments is very unlikely (not to mention unwieldy).

Indeed, valid file with over 2G comments extremely unlikely. However we should protect against malicious intent too - it's very easy to craft small file that would crash somebody's service.

I'm completely OK with signed values for size for convinience, especially in a language with no automatic integral promotion/conversion. But there should be check and graceful error when dealing with user supplied data, not panic.

dhowden commented 4 years ago

Posted https://github.com/dhowden/tag/pull/59 which is starting point for switching to unsigned integers. Feel free to add to it if you find more quick changes to uint.

I will come back to this at the weekend and merge it if there are no issues (works fine on my collection, so it won't break my music player, so I'm happy!)