drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.57k stars 482 forks source link

EXIF DateTimeOriginal: Properly handle "0000:00:00 00:00:00" #609

Closed StefanOltmann closed 8 months ago

StefanOltmann commented 1 year ago

Some older digital cameras, such as the Olympus C750UZ, may not have recorded a proper exif:DateTimeOriginal value. Instead of leaving this field empty, they used 0000:00:00 00:00:00 as a placeholder.

Most libraries treat this value as non-existent and ignore it. However, metadata-extractor uses SimpleDateFormat to parse it, which can result in unexpected behavior. When SimpleDateFormat("yyyy:MM:dd HH:mm:ss").parse("0000:00:00 00:00:00") is called, it returns Sun Nov 30 00:00:00 GMT 2, which is not what a user would expect.

This issue has been discussed on StackOverflow (https://stackoverflow.com/questions/30394996/java-getdate-0000-00-00-return-strange-value). There are two potential solutions: either filter out all instances of these placeholder strings, or call setLenient(false), so that any illegal formats will result in an exception.

To illustrate the issue, I have included a test file below: exif_datetimeoriginal_placeholder

StefanOltmann commented 1 year ago

@drewnoakes

To improve my code, I've updated it to check whether the date string includes any of the available placeholder patterns.

As for fixing the library, I recommend invoking setLenient(false) and returning null if a parsing error occurs. This approach should handle all scenarios.

/**
 * A collection of possible placeholder values that older cameras
 * may have written for exif:DateTimeOriginal.
 * So far, we have seen actual photos with the value "0000:00:00 00:00:00"
 * The other entries are assumptions based on common patterns.
 */
val placeHolderDateStrings: Set<String> = setOf(
    "0000:00:00 00:00:00",
    "0000:00:00 00:00",
    "0000-00-00 00:00:00",
    "0000-00-00 00:00",
    "0000.00.00 00:00:00",
    "0000.00.00 00:00",
    "0000-00-00'0'00:00:00",
    "0000-00-00'0'00:00",
    "0000-00-00",
    "0000-00",
    "00000000",
    "0000"
)

fun isDateStringPlaceholder(dateTimeOriginal: String?): Boolean =
    dateTimeOriginal == null || placeHolderDateStrings.contains(dateTimeOriginal)