drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.54k stars 475 forks source link

Add raw value to Tag #601

Open FilippoVigani opened 1 year ago

FilippoVigani commented 1 year ago

It would make sense to add the raw value to the class Tag on top of the already existing human-readable description, the same way that we have the tagType field which is technically the raw value of the human-readable tagName field.

drewnoakes commented 1 year ago

getObject should give you that, no?

https://github.com/drewnoakes/metadata-extractor/blob/5754a0d33659e6b1e9d8f35cf24bc03e0fbaf1b6/Source/com/drew/metadata/Directory.java#L1090-L1101

That API comment isn't great, looking at it again.

FilippoVigani commented 1 year ago

That is kind of what I was looking for, I will admit it was hard to find at first. Maybe improving the documentation would be a good first step. One addition for the raw value would be to add support for retrieving the raw bytes instead of a class-specific object.

drewnoakes commented 1 year ago

We don't always hold on to the raw bytes for every tag. That would increase memory consumption.

Could you explain your use case?

FilippoVigani commented 1 year ago

In my case I would like to have the raw bytes because for a forensic reporting tool they are necessary for re-parsing from third party tools without including the original file. So basically in the report include both human-readable formats and raw formats.

drewnoakes commented 1 year ago

There isn't always a 1:1 mapping between tag and byte(s). It'd be helpful to see a concrete example, with specific tags.

necessary for re-parsing from third party tools

Do you mean you need to extract only the metadata, persist it somewhere, then re-parse it later? Depending upon the format, you can do that. For example, take a look at JpegSegmentReader which will give you access to the different JPEG segments. You can then parse them individually at your leisure. What we don't have (and would be hard to add) is a way to map a tag to a specific byte segment.