denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

Embed metadata upon file ingest #95

Open gjost opened 5 years ago

gjost commented 5 years ago

When ingesting a File, embed metadata attributes (ddr id, contributing institution, checksum?, orig filename?) Develop spec requirement What metadata do we embed? What files are we going to try to embed in? How do we convert existing files? Do we?

GeoffFroh commented 5 years ago

TODO: Determine which attributes to embed.

gjost commented 5 years ago

We'll have to embed the parent Entity ID rather than the file ID. Embedding the file's ID would require hashing the file, and then when we embed the ID it will change the hash.

gjost commented 5 years ago

Images

Python XMP Toolkit. Claims to support JPEG, TIFF, GIF, PNG, PSD, InDesign, MOV, MP3, MPEG2, MPEG4, AVI, FLV, SWF, ASF, PostScript, P2, SonyHDV, AVCHD, UCF, WAV, XDCAM, XDCAMEX. We're currently using this to extract data. Hasn't been updated in awhile.

Alternative: python3-exiv2. Reads/writes more than XMP: EXIF, IPTC. Python3 only. There is py2exiv2 lurking about somewhere.

Audio

Mutagen claims to handle ID3 and APEv2 tags for ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg Theora, Ogg Vorbis, True Audio, WavPack, OptimFROG, and AIFF, but not WAV

Video

Documents (PDF, ???)