Participatory-Image-Archives / pia-backlog

Repository for documenting stories and use cases for PIA, a Sinergia project funded by the Swiss National Science Foundation (SNSF).
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Sorting images per date on a timeline #1

Open julsraemy opened 3 years ago

julsraemy commented 3 years ago

Description

As a user, I would like to be able to view the photos, digital or analogue, on a timeline.

Variation(s)

Sorting images per given metadata field (subject, location, format, etc.)

Proposed Solutions

A request was made by Team C to be able to arrange and display on a timeline images by year or decade during the workshop, especially for the _SGV12 Ernst Brunner collection which is fully digitised. For this particular collection, about 47,800 images were digitised and according to Salsah, ca. 38,850 of those records contain a date (within the hasDate property) .

However, it is not possible to get a chronologically ordered list of all the images without looking up the database. A given downloaded resource do indeed contain (in its header by searching for the appropriate XMP metadata fields) the date when it was digitised but not when it was taken by the photographer (if known).

Exporting all metadata from Salsah

Team B has been in the process of extracting all metadata and data related to the PIA project (SGV_05, SGV_10 and SGV_12) from Salsah and is in collaboration with the DaSCH that manages the virtual research environment. Lukas Rosenthaler created a Python script - that we've slightly modified - and with the following command, we can extract all images in TIFF and their associated metadata in XML:

python3 salsah2xml.py -P sgv -s 4102 --start 0 --nrows -1 --filter sgv:in_collection={ID} --download --write-metadata --restype sgv:image {username} {password} https://www.salsah.org

If we've been able to extract all the images from the SGV_12, we are still in the process of extracting only the metadata as the batch process sometimes encounters invalid characters (such as backspaces) and makes the script breaks and unable to write XML.

Extracting date-based metadata

Once we will have all the associated metadata in XML, we can extract the name of the file (equivalent to its SGV Signature) and the date by using the XML2 utility to a CSV file (very useful for extracting non-repeatable XML tags, attributes and values) rather than waiting for a specific script that will be done for exporting all metadata to a dedicated PIA database.

Amending XMP metadata fields

For adding or modifying metadata in the header of a given image, we can use ExifTool. The XMP metadata fields that we could potentially change are:

The entries must comply with the W3C datetime practice (which is basically based on ISO 8601). For instance, we can enter a year (YYYY), a year and month (YYYY-MM), a complete date (YYYY-MM-DD) and even something more precise with hours, minutes and seconds. What we can't do is enter a range such as a decade or a rough guess like "19XX", "between 1920 and 1930" or "1985?", something that is a common practice within bibliographical and archival records.

Creating a routine to embed dates in all image headers

A batch associating the metadata extracted from the Salsah export by embedding it in the image headers via one or more ExifTool commands will need to be developed to automate the process.

Limitations

Additional Background

julsraemy commented 3 years ago

@thgie will see with Team C what they really want before we make progress. In addition, we want to first extract all metadata before making a decision.

julsraemy commented 3 years ago

All metadata from SGV_10 and SGV_12 have been downloaded. Extraction of id, signature and date in a CSV have also been conducted on SGV_12 using the XML2 utility tool: https://github.com/Participatory-Image-Archives/pia-data-model/blob/main/salsah-export/ernst_brunner_date.csv

thgie commented 3 years ago

In order to minimize exposure to technical issues for this story, I proposed a small webapp that brings images and metadata together. it works on top of the salsah2xml cleanup script and doesn't touch images nor metadata directly.

https://github.com/Participatory-Image-Archives/tool-rough-nav

The client is in process of evaluation.