WGBH-MLA / ams

Archival Management System to support the American Archive of Public Broadcasting
GNU General Public License v3.0
5 stars 8 forks source link

KNME descriptions missing #919

Closed ekemeyer closed 1 month ago

ekemeyer commented 1 month ago

Miranda identified around 3.5K assets that had lost their descriptions and other relevant data during the migration, mostly in the NM collection. Most descriptions are still in AAPB, so that pbcore will need to be brought down and mapped to an update spreadsheet for ingest into AMS2.

Some descriptions are also missing from AAPB, but are accurate in AMS2-demo (see example here: https://ams2-demo.wgbh-mla.org/concern/asset_resources/cpb-aacip-191-00ns1rrc). The second step for this ticket will be to pull that data down and generate a second spreadsheet to update AMS2.

some details: These are the pbcore fields missing data and their mapping to ingester csv.

fields = {
    'Asset.asset_types': ".//pbcore:pbcoreAssetType",
    'Asset.producing_organization': ".//pbcore:pbcoreCreator/pbcore:creator",
    'Asset.local_identifier': ".//pbcore:pbcoreIdentifier[@source='Local Identifier']",
    'Asset.aacip_identifier': ".//pbcore:pbcoreIdentifier[@source='http://americanarchiveinventory.org']",
    'Asset.genre': ".//pbcore:pbcoreGenre[@source='AAPB Format Genre']"
}

#Specific fields mapping based on their type attributes
title_type_mapping = {
    'Series': 'Asset.series_title',
    'Program': 'Asset.program_title',
    'Episode': 'Asset.episode_title',
    'Promo': 'Asset.promo_title'
}

date_type_mapping = {
    'Broadcast': 'Asset.broadcast_date',
    'Created': 'Asset.created_date',
    'Copyright': 'Asset.copyright_date'
}

description_type_mapping = {
    'Series': 'Asset.series_description',
    'Program': 'Asset.program_description',
    'Episode': 'Asset.episode_description',
    'Promo': 'Asset.promo_description'
ekemeyer commented 1 month ago

nm_URLS.txt

ekemeyer commented 1 month ago
ekemeyer commented 1 month ago

This is complete after a couple of iterations - AMS2-Demo, AAPB and AMS2 all had varying amounts of description metadata. Raw footage description was left out accidentally in the first round, so a second round of ingests had to happen to include those. I can't confirm it is 100% accurate, but AMS2 should now have all the descriptions that had previously existed in AAPB and/or AMS2-Demo.