Sobesednik / node-exiftool

A Node.js interface to exiftool command-line application.
MIT License
91 stars 15 forks source link

Problem with swedish characters #22

Closed edgesoft closed 7 years ago

edgesoft commented 7 years ago

skarmavbild 2017-04-27 kl 14 33 42

I'm getting gibberish when trying to save åäö as Caption

edgesoft commented 7 years ago

I'm on Mac OS.

z-vr commented 7 years ago

@edgesoft thanks, will have a look into it

edgesoft commented 7 years ago

@z-vr Yes, But I think we need to write utf-8 stream here and not quite sure how to do it 👍 I made this with java before and it worked but I really don't know here.

z-vr commented 7 years ago

@edgesoft

const comment = 'åäö'
const data = {
    all: '',
    comment,
}
return ep.writeMetadata(tempFile, data, ['overwrite_original'])
    .then((res) => {
        assert.equal(res.data, null)
        assert.equal(res.error, '1 image files updated')
    })
    .then(() => ep.readMetadata(tempFile))
    .then((res) => {
        console.log(res.data)
    })
[ { SourceFile: '/var/folders/s0/aaa/T/node-exiftool_test_51996.jpg',
    ExifToolVersion: 10.33,
    FileName: 'node-exiftool_test_51996.jpg',
    Directory: '/var/folders/s0/aaa/T',
    FileSize: '49 kB',
    FileModifyDate: '2017:05:07 23:27:53+01:00',
    FileAccessDate: '2017:05:07 23:27:53+01:00',
    FileInodeChangeDate: '2017:05:07 23:27:53+01:00',
    FilePermissions: 'rw-r--r--',
    FileType: 'JPEG',
    FileTypeExtension: 'jpg',
    MIMEType: 'image/jpeg',
    Comment: 'åäö',
    ImageWidth: 500,
    ImageHeight: 334,
    EncodingProcess: 'Baseline DCT, Huffman coding',
    BitsPerSample: 8,
    ColorComponents: 3,
    YCbCrSubSampling: 'YCbCr4:2:0 (2 2)',
    ImageSize: '500x334',
    Megapixels: 0.167 } ]

exiftool /var/folders/s0/aaa/T/node-exiftool_test_51996.jpg

ExifTool Version Number         : 10.40
File Name                       : node-exiftool_test_51996.jpg
Directory                       : /var/folders/s0/aaa/T
File Size                       : 49 kB
File Modification Date/Time     : 2017:05:07 23:27:53+01:00
File Access Date/Time           : 2017:05:07 23:27:54+01:00
File Inode Change Date/Time     : 2017:05:07 23:27:53+01:00
File Permissions                : rw-r--r--
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
Comment                         : åäö
Image Width                     : 500
Image Height                    : 334
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Image Size                      : 500x334
Megapixels                      : 0.167

if you read your file with exiftool, what does it show?

edgesoft commented 7 years ago

@z-vr Yes, I saw that this is correct when reading writing but it will not be correct if a user opens Adobe Photoshop on a mac. Had this problem with java as well and switched to a UTF-8 write. Don't know how to do it with this implementation.

z-vr commented 7 years ago

@edgesoft OK I will have a further look

edgesoft commented 7 years ago

@z-vr This is how I did it in java. As you can see the streams are utf-8

protected static IOStream startExifToolProcess(List<String> args)
            throws RuntimeException {
        Process proc = null;
        IOStream streams = null;

        log("\tAttempting to start external ExifTool process using args: %s",
                args);

        try {
            proc = new ProcessBuilder(args).start();
            log("\t\tSuccessful");
        } catch (Exception e) {
            String message = "Unable to start external ExifTool process using the execution arguments: "
                    + args
                    + ". Ensure ExifTool is installed correctly and runs using the command path '"
                    + EXIF_TOOL_PATH
                    + "' as specified by the 'exiftool.path' system property.";

            log(message);
            throw new RuntimeException(message, e);
        }

        log("\tSetting up Read/Write streams to the external ExifTool process...");

        // Setup read/write streams to the new process.
        try {
            streams = new IOStream(new BufferedReader(new InputStreamReader(
                    proc.getInputStream(),"UTF-8")), new OutputStreamWriter(
                    proc.getOutputStream(),"UTF-8"));
            log("\t\tSuccessful, returning streams to caller.");
            return streams;
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
        //  e.printStackTrace();
        }

        throw new RuntimeException( "Could not set encoding" );

    }
z-vr commented 7 years ago

@edgesoft OK the solution to your problem is that you have to pass codedcharacterset=utf8, like this

const myMetadata = {
    all: '', // remove all metadata at first
    Title: 'åäö',
    LocalCaption: 'local caption',
    'Caption-Abstract': 'Câptïön \u00C3bstráct: åäö',
    Copyright: '2017 ©',
    'Keywords+': [ 'këywôrd \u00C3…', 'keywórdB ©˙µå≥' ],
    Creator: 'Mr Author',
    Rating: 5,
}

function writeMetadata(file, metadata) {
    return ep
        .open()
        .then((pid) => console.log('Started exiftool process %s', pid))
        .then(() => ep.writeMetadata(file, metadata, ['codedcharacterset=utf8']))
        .then((res) => {
            console.log(res)
        })
        .catch((err) => {
            console.error(err)
        })
        .then(() => ep.close())
        .then(() => console.log('Closed exiftool'))
}

writeMetadata('image-utf8-codedcharacterset.jpg', myMetadata)
    .catch(console.error)

The default exiftool internal encoding is utf8, but default iptc encoding is latin1, so you need to pass this flag. See here http://www.sno.phy.queensu.ca/~phil/exiftool/faq.html question 10 ("How does ExifTool handle coded character sets?"):

IPTC†: The value of the IPTC:CodedCharacterSet tag determines how the internal IPTC string values are interpreted. If CodedCharacterSet exists and has a value of "UTF8" (or "ESC % G") then string values are assumed to be stored as UTF‑8. Otherwise the internal IPTC encoding is assumed to be Windows Latin1 (cp1252), but this can be changed with "-charset iptc=CHARSET". When reading, these strings are converted to UTF‑8 by default, or to the external character set specified by the -charset or -L option. When writing, the inverse conversions are performed. No conversion is done if the internal (IPTC) and external (ExifTool) character sets are the same. Note that ISO 2022 character set shifting is not supported. Instead, a warning is issued and the string is not converted if an ISO 2022 shift code is encountered. See the IPTC IIM specification for more information about IPTC character coding.

Also go here for IPTC tags

Example Output

Started exiftool process 10446
-all=
-Title=åäö
-LocalCaption=local caption
-Caption-Abstract=Câptïön Ãbstráct: åäö
-Copyright=2017 ©
-Keywords+=këywôrd Ã…
-Keywords+=keywórdB ©˙µå≥
-Creator=Mr Author
-Rating=5
-json
-s
-codedcharacterset=utf8
image-utf8-codedcharacterset.jpg

This should produce the result you need. screen shot 2017-05-13 at 16 51 07 screen shot 2017-05-13 at 17 04 10

z-vr commented 7 years ago

But it's true that you cannot do it in current version, https://github.com/Sobesednik/node-exiftool/blob/master/src/lib.js#L52 because this flag has to come after tags.

z-vr commented 7 years ago

@edgesoft OK this is fixed now in 2.1.2 please try again by following instructions in readme: https://github.com/Sobesednik/node-exiftool/#writing-tags-for-adobe-in-utf8

edgesoft commented 7 years ago

skarmavbild 2017-05-14 kl 12 31 22 @z-vr Thanks! Verified that it works. Good stuff 👍