drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Other
936 stars 167 forks source link

Reading XMP data from MP3 files? #345

Open Numpsy opened 1 year ago

Numpsy commented 1 year ago

Hi,

Is there presently any means of reading XMP data out of MP3 files with metadata-extractor?

I believe that the XMP data is stored inside the ID3 data, and I see the comment at

https://github.com/drewnoakes/metadata-extractor-dotnet/blob/dbc0a56761c1897b4f4ce2aee97b75f0fb75148d/MetadataExtractor/Formats/Mpeg/Mp3Reader.cs#L18

about reading ID3 data, so maybe it's not presently possible

drewnoakes commented 12 months ago

I've never heard of XMP within ID3. Do you have a reference or sample?

Numpsy commented 12 months ago

It's documented in the Adobe XMP SDK specs at https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/XMPSpecificationPart3.pdf, section 1.2.5 (the Adobe C++ SDK has the ability to read and write it)

I've been having a go at doing a minimal set of read code for my purposes (where I could do with some managed reading code for corss platform use rather than juggling C++ code), I may or may not have time to try adding it on over here at some point.

drewnoakes commented 12 months ago

Thanks, that's really helpful.

1.2.5 MP3

MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a popular audio encoding format. MPEG stands for Moving Picture Experts Group. The formal standard is ISO/IEC IS 11172-3, but this only covers the raw audio aspects. The metadata in MP3 files uses the ID3v1 or ID3v2 format. When used with XMP, this must be ID3v1, ID3v2, ID3v2.3 or ID3v2.4. The ID3v2.3 and ID3v2.4 formats are almost identical. The most notable difference is that ID3v2.4 allows text values to be UTF-8, in addition to ISO 8859-1 (Latin-1) or UTF-16. The entire ID3 portion of the MP3 file is called the ID3 "tag" (rather confusingly, given other media file and metadata terminology). The individual metadata items are called ID3 "frames".

1.2.5.1 Placement of XMP

The XMP is placed within the ID3 as a "PRIV" frame with an Owner identifier of "XMP". The content of the XMP PRIV frame is the XMP packet, encoded as UTF-8. MP3 files can contain native metadata; see detail of reconciliation with XMP in 2.3.3, “Native metadata in MP3”. Specifications can be found at:

If you wanted to add this to the library, I'd happily support any PR to do so.

Numpsy commented 7 months ago

A question for the record, in case I get more time to try it - would something that just gets XMP out of ID3v2 tags work, or would it need to be something that reads more extensive data? (I'm sure there'd be scope to extend in the future though)

drewnoakes commented 6 months ago

Ideally we'd add an understanding of ID3 so that we can correctly pull XMP from within. Otherwise we're reduced to scanning for content that looks like XMP, which can be fraught with bugs.

Numpsy commented 6 months ago

At a really basic level, you can walk through the frames in the ID3 tag until you find a PRIV/XMP one, ignoring any others, and then stop if a match is found (or maybe with a bit more validation on the tag length / overall contents etc) - and understanding of more frame types could be added later if needed.

which can be fraught with bugs.

Yes, a problem with the packet scanning approach is that you might have embedded images and such that contain XMP of their own, and it'd more work to deal with that.