drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.55k stars 479 forks source link

How to test against the images database? #110

Closed lilith closed 9 years ago

lilith commented 9 years ago

Is there an automated way to test MetadataExtractor against the metadata-extractor-images repository? All unit tests pass in the C# port, but I'm seeing some regressions on certain images where only the basic jpeg directory is available, and all (more sophisticated) directories are missing.

An example:

https://raw.githubusercontent.com/drewnoakes/metadata-extractor-images/master/jpg/FujiFilm%20FinePix2650.jpg

RicardoBochnia commented 9 years ago

No but #64 will add this feature. I already did some work on #64, maybe I will use this weekend to complete it.

drewnoakes commented 9 years ago

My current approach is to run the lib across the database and use Git to track any changes. This has been very useful in catching regressions, as well as validating how useful/far reaching changes have been.

This became a little tougher when file-system metadata was added, as that differs per checkout of the repo, so I'm inclined to remove that section from the text files.

lilith commented 9 years ago

What commands do you use to execute the library? I'd like to establish a baseline for 2.8.0, so I can narrow down the source of the problem. It's quite possible the automated source conversion missed an edge case, but given that almost half of the jpegs parse correctly, and the other half just have missing directories, it smells like a rather subtle bug.

drewnoakes commented 9 years ago

Something like:

java -jar com.drew.tools.ProcessAllImagesInFolderUtility -text "/path/root/for/search"

See this source file to see what's going on.

It walks recursively, looking for certain file types, parses them and writes into the metadata/ subfolder a file with the same name, ending in .txt, IIRC.

lilith commented 9 years ago

How are you building that jar?

drewnoakes commented 9 years ago

Sorry, no need for the -jar option.

On Wed, 3 Jun 2015 16:57 Nathanael Jones notifications@github.com wrote:

How are you building that jar?

— Reply to this email directly or view it on GitHub https://github.com/drewnoakes/metadata-extractor/issues/110#issuecomment-108496485 .

lilith commented 9 years ago

Using java -cp ./Output/maven/classes com.drew.tools.ProcessAllImagesInFolderUtility -text "/Users/nathanael/Documents/delete/metadata-extractor-images/", I get errors for most images, -> Could not initialize class com.drew.imaging.jpeg.JpegMetadataReader. Using 65623d2b172ae81b0d8502dfa1514f9782067bcb

I rolled back to 2.8.0 to give that a try, same problem: https://gist.github.com/nathanaeljones/9ebdb1e0bac19cdd59da

drewnoakes commented 9 years ago

The underlying exception in this case is because Adobe's XMPCore JAR file is not on the classpath and therefore was not found.

lilith commented 9 years ago

Thanks! I didn't catch that.

This worked: java -cp ./Libraries/*:./Output/maven/classes com.drew.tools.ProcessAllImagesInFolderUtility -text "/Users/nathanael/Documents/delete/metadata-extractor-images/"

drewnoakes commented 9 years ago

Are you happy for me to close this now?

FYI I'm working on getting the Java and .NET implemenation to produce identical output over the image database. It's close, but there are a few tricky differences left -- mostly just that C#/.NET has a richer set of primitive types that map better to those used in Exif and other formats (such as unsigned integers) which Java doesn't have, and which then cause differences in output. I'm tracking these in issues against the .NET project for now.