carpentries-lab / python-aos-lesson

Python for Atmosphere and Ocean Scientists
https://carpentries-lab.github.io/python-aos-lesson/
Other
87 stars 49 forks source link

Add metadata to images #34

Closed MitchellBlack closed 3 years ago

MitchellBlack commented 3 years ago

@DamienIrving I think the following will solve the problem of adding metadata to images:

f = "test.png"
METADATA = {"History": "command line argument"}

# Create a sample image
import pylab as plt
import numpy as np
X = np.random.random((50,50))
plt.imshow(X)
plt.savefig(f)

# Use PIL to save some image metadata
from PIL import Image
from PIL import PngImagePlugin

im = Image.open(f)
meta = PngImagePlugin.PngInfo()

for x in METADATA:
    meta.add_text(x, METADATA[x])
im.save(f, "png", pnginfo=meta)

im2 = Image.open(f)
print(im2.info)
MitchellBlack commented 3 years ago

@DamienIrving scrap the above, a more elegant solution is:

# Create a sample image
import pylab as plt
import numpy as np
X = np.random.random((50,50))
plt.imshow(X)
plt.savefig('test.pdf',metadata={'Title':'Data provenance here'}) 
plt.savefig('test.png',metadata={'History':'Data provenance here'})

For pdf images the standard keys are 'Title', 'Author', 'Subject', 'Keywords', 'Creator', 'Producer', 'CreationDate', 'ModDate', and 'Trapped'. See metadata of PdfPages

For png images the keys must be shorter than 79 chars. See metadata of print_png

hot007 commented 3 years ago

Nice!

DamienIrving commented 3 years ago

Thanks, @MitchellBlack. This is great.

I think for the provenance lesson we'd only need to cover .png, .pdf and perhaps .svg (the latter accepts the Title keyword too).

The second part of the problem is viewing the metadata once we've added it to an image. Using Python is a bit messy, because you basically need a different library for each file format (in the following examples I've created a rainfall image file for various file formats with Log of command line entries... entered into the metadata):

from PIL import Image

image = Image.open('rainfall.png')
image.load()
print(image.info)
{'Software': 'Matplotlib version3.3.3, https://matplotlib.org/', 'History': 'Log of command line entries...', 'dpi': (72, 72)}
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

fp = open('rainfall.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)

print(doc.info)
[{'CreationDate': b"D:20210206134032+11'00'", 'Creator': b'Matplotlib v3.3.3, https://matplotlib.org', 'Producer': b'Matplotlib pdf backend v3.3.3', 'Title': b'Log of command line entries...'}]

It's probably easier to use a command line program that can handle many different image formats instead. Lots of people online suggest the identify command line tool that comes with ImageMagick, but I found it hard to install on my Mac (the conda recipies for it didn't work, I didn't want to install brew, my laptop thought it was malware, etc). A more straightforward alternative is exiftool, for which the conda recipes work great. It works for lots of different image formats, e.g:

$ conda install exiftool
$ exiftool rainfall.png 
ExifTool Version Number         : 11.99
File Name                       : rainfall.png
Directory                       : .
File Size                       : 75 kB
File Modification Date/Time     : 2021:02:06 08:55:30+11:00
File Access Date/Time           : 2021:02:06 08:55:32+11:00
File Inode Change Date/Time     : 2021:02:06 08:55:30+11:00
File Permissions                : rw-r--r--
File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 864
Image Height                    : 360
Bit Depth                       : 8
Color Type                      : RGB with Alpha
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Software                        : Matplotlib version3.3.3, https://matplotlib.org/
History                         : Log of command line entries...
Pixels Per Unit X               : 2835
Pixels Per Unit Y               : 2835
Pixel Units                     : meters
Image Size                      : 864x360
Megapixels                      : 0.311
$ exiftool rainfall.svg
ExifTool Version Number         : 11.99
File Name                       : rainfall.svg
Directory                       : .
File Size                       : 637 kB
File Modification Date/Time     : 2021:02:06 14:29:11+11:00
File Access Date/Time           : 2021:02:06 14:29:12+11:00
File Inode Change Date/Time     : 2021:02:06 14:29:11+11:00
File Permissions                : rw-r--r--
File Type                       : SVG
File Type Extension             : svg
MIME Type                       : image/svg+xml
Image Height                    : 360pt
SVG Version                     : 1.1
View Box                        : 0 0 864 360
Image Width                     : 864pt
Xmlns                           : http://www.w3.org/2000/svg
Title                           : Log of command line entries...
Work Type                       : http://purl.org/dc/dcmitype/StillImage
Work Title                      : Log of command line entries...
Work Date                       : 2021:02:06 14:29:10.697468
Work Format                     : image/svg+xml
Work Creator Agent Title        : Matplotlib v3.3.3, https://matplotlib.org/
$ exiftool rainfall.pdf
ExifTool Version Number         : 11.99
File Name                       : rainfall.pdf
Directory                       : .
File Size                       : 200 kB
File Modification Date/Time     : 2021:02:06 13:40:33+11:00
File Access Date/Time           : 2021:02:06 13:45:43+11:00
File Inode Change Date/Time     : 2021:02:06 13:40:33+11:00
File Permissions                : rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Create Date                     : 2021:02:06 13:40:32+11:00
Creator                         : Matplotlib v3.3.3, https://matplotlib.org
Producer                        : Matplotlib pdf backend v3.3.3
Title                           : Log of command line entries...
Page Count                      : 1

... or with a bit of cleaning:

$ exiftool rainfall.pdf | grep '^Title*' | cut -f2 -d ":"
 Log of command line entries...
MitchellBlack commented 3 years ago

Another option is to simply open the image using vim (I did this to check that my suggestions worked). The 'History/Title' shows up among the binary jargon. However, this certainly isn't as 'clean' as what you have proposed with exiftool.

DamienIrving commented 3 years ago

I've had a go at updating the Data Provenance lesson so that the command log is written to the output PNG metadata: https://carpentrieslab.github.io/python-aos-lesson/09-provenance/index.html

The old lesson was also a bit confusing because cmdprov.new_log would return the command that was executed in the background in order to launch the Jupyter notebook, so I've changed the lesson to avoid that confusion.