JoshOBrien / exiftoolr

ExifTool Functionality from R
https://joshobrien.github.io/exiftoolr/
23 stars 1 forks source link

Double arguments get lost in exif_read() #3

Closed insilentio closed 4 years ago

insilentio commented 4 years ago

Hi Josh thanks for the useful package. While using exif_read(), I have stumbled upon the following problem: I need to call it with the charset argument. However, charset can be used on different levels; in my case I need it once for the filename and once for the tags themselves. So my call (on CLI) goes like this: exiftool "c:/temp/myfile.jpg" -charset exiftool=cp850 -charset filename=cp1250

However, trying to replicate this with exif_read() does not work. exif_read(path = myimage, args = c("-charset", "exiftool=cp850", "-charset", "filename=cp1250")) I get an error:

Error: lexical error: invalid char in json text. Error: File not found - filenam The way I understand it is that the function first makes a list of unique arguments, therefore throwing away one of the charset argument. Then obviously it tries to call exiftool with the argument "filename" which does not exist. Hence I wonder if the unique() call is really necessary or at least if you could make it optional.

Thanks, Daniel

JoshOBrien commented 4 years ago

@insilentio Thanks for the report. That sounds reasonable, but I won't be able to have a closer look for at least a day. Will get to it as soon as possible, though.

JoshOBrien commented 4 years ago

@insilentio Hi Daniel. Would you mind installing the fixed version I just pushed up to GitHub, to confirm that it now works for you?

Thanks,

Josh

insilentio commented 4 years ago

@JoshOBrien Hi Josh. I have installed the new version from github. Unfortunately, due to restrictions at my workplace regarding installation of rtools and the proxy settings, I was not able to test it in the Windows environment where the problem originally occurred (I hope you can push the new version to CRAN soon :-)). However, I have tried to replicate the situation on my Mac. The problem with the 2 arguments is definitively solved here.

Thanks a lot for the quick action, Daniel

JoshOBrien commented 4 years ago

@insilentio

Sure, I'll go ahead and push that to CRAN. (ETA, three hours later, it's already been accepted by CRAN and the new version of the source package is up on the home repository at https://cran.wu.ac.at/ ! Will typically take a day or two for the Windows binary package to be compiled and make its way out to all the mirror repositories.)

If you happen to have (or can easily construct) an image file that uses two non-default charsets in its file name and its tags, and which you can share with me for testing purposes, I'd really appreciate that. (I do understand that you may not be able to do so.)

Thanks,

Josh

insilentio commented 4 years ago

@JoshOBrien

I have an example attached, although I am not sure how it will behave on your environment - I guess it depends heavily upon the codepage of your OS. You'll have to set the filename to _QSHöngg.jpg as GitHub is replacing the original name with a random one. Anyway, in my case the result without and with charset arguments (on CLI): exiftool QS_Höngg.jpg -filename -city File Name : QS_H÷ngg.jpg City : Z├╝rich

The strange characters are misinterprations of the german "Umlaute" ö and ü. When using with charset, I get the desired result. But at least in my case, I have to use different codepages for the tags and the name; it does not work otherwise.

exiftool QS_Höngg.jpg -filename -city -charset filename=cp1250 -charset exiftool=cp850 File Name : QS_Höngg.jpg City : Zürich

Hope that helps. Best, Daniel QS_Höngg

JoshOBrien commented 4 years ago

@insilentio Thanks! I can report that this at least now runs without an error on my Windows 10 OS. You must be right about my OS' code page not being the right one to fully test this on, though, as the characters with umlauts are still not properly rendered in the value returned by exif_read(). Once this reaches CRAN, please do let me know whether this does or does not work correctly.

insilentio commented 4 years ago

@JoshOBrien well, the good news is, the change is working as intended! Thanks again.

However, that doesn't help in my specific case, as I've found out now. Problem seems to be mainly that you are using JSON output in exif_read() (obviously, for dataframe conversion), and JSON output is always (see exiftool doc) converted to UTF8. So I end up with my strange characters again, no matter what codepages I am using as arguments. (without your change, I couldn't even read the files, though). I was trying with iconv() to get a useful output, with no luck so far. I guess I will go with exif_call() now and use the -csv argument and try to parse the output into a dataframe.

Anyway, when I was working with your code (exif_read()), I found that the -q argument is set in any case, before you even check for the parameter: args <- c("-n", "-j", "-q", "-b", args) if (quiet) { args <- c(args, "-q") }

Not sure it that is really intended. Best, Daniel

JoshOBrien commented 4 years ago

@insilentio

Very interesting and good to know about that conversion to UTF8. If you end up with somewhat robust code for parsing the results of exif_call(), I'd be interested in having a look, and potentially incorporating that in the package.

I can't remember now why I included a -q flag in the set of always-supplied flags, but I'm pretty sure it was intentional. Bolstering my recollection is this comment in the source code of exif_read():

## an extra -q further silences warnings
if (quiet) {
    args <- c(args, "-q")
}

Take care and best of luck,

Josh

insilentio commented 4 years ago

I am currently working with the following code which seems to work; don't know about the robustness, though:

arglist <- c("-charset", "exiftool=cp1250", "-charset", "filename=cp1252", "-csv", "-n", "-b", "-T")  
exifinfo <- exif_call(c(arglist, taglist), image_files, intern = TRUE)  
exifinfo <- read_csv(paste(exifinfo, collapse = "\n"))

In the args, the -csv and -T are crucial; in my case it then works properly for a list of several 100 images. I use readr's read_csv; probably read.csv would work, too. The trick is to collapse the csv-input with a line break (\n).

If that is overall more robust, I am not so sure. With the csv, you get the risk of different separators according to different locales (e.g. ";" instead of ",").

Best, Daniel

JoshOBrien commented 4 years ago

@insilentio

The trickiest part seems to be properly processing tag values that contain commas, double quotes, new lines, or leading or trailing spaces.

The ExifTool FAQ here includes a couple of recipes for doing that from the command line, but I haven't been able to get the Windows one to work properly. If I do figure that out, I'd definitely consider adding processing via csv output as an alternative to the tool's current processing via json output.

Cheers,

Josh

JoshOBrien commented 4 years ago

@insilentio

OK, I did figure out how to do this, and have implemented an initial version in this repository's "csv-read" branch. I will eventually add it as an option to exif_read().

When I do so, do you mind if include the image you sent me in the package, to demonstrate the added functionality?

Thanks for your help,

Josh

insilentio commented 4 years ago

@JoshOBrien the image is under a CC BY SA 4.0 license, therefore it should be fine. Thanks for your efforts, Daniel

JoshOBrien commented 3 years ago

@insilentio FYI, I've now added the option to process Exif metadata via a csv (rather than a JSON) intermediate, and used (a compressed version of) the image you shared in the example demonstrating that option's use. The new option is available from exiftoolr_0.1.5, and can be used as shown below. Thanks once again for helping me to get this working.

library(exiftoolr)
## Use pipeline="csv" for images needing explicit specification
## and proper handling of a non-default character sets
img_file <- system.file(package = "exiftoolr", "images", "QS_Hongg.jpg")
args <- c("-charset", "exiftool=cp1250")
res <- exif_read(img_file, args = args, pipeline = "csv")
res[["City"]]  ## "Zurich", with an umlaut over the "u"