denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

mock Internet Archive API for internal testing #128

Closed gjost closed 5 years ago

gjost commented 5 years ago

DDR A/V content (video, audio) is uploaded to Internet Archive. The DDR site is served by Densho but the media assets are served by IA. Example: https://archive.org/details/ddr-densho-1000-210-1

Problem: Objects whose A/V content has not yet been uploaded to IA cannot be previewed because the binaries are not there. Proposal: A mock Internet Archive API. This would be an XML file for each Entity and a set of derivative files that could be uploaded to a server that could be used for testing.

Background:

For our purposes, the "Internet Archive API" is the XML files that are created by IA when they ingest a file and its metadata. The data is provided by Densho via $LINK_TO_SCRIPT_HERE. The XML file lists the original file and its derivatives (various video formats, mp3, thumbnails) plus a SQLite3 .db file and the XML file itself.

Object URL https://archive.org/details/ddr-densho-1000-210-1 XML URL https://archive.org/download/ddr-densho-1000-210-1/ddr-densho-1000-210-1_files.xml

<files>
  <file name="ddr-densho-1000-210-1-mezzanine-a709bc73aa.mpg" source="original">
    <format>MPEG2</format>
    <mtime>1471583343</mtime>
    <size>813281284</size>
    <md5>768621ac1ec82eb7fe72c8f183c20df3</md5>
    <crc32>d3882041</crc32>
    <sha1>a709bc73aad02447a17b9975abb5e1a19d98e4ed</sha1>
    <length>181.54</length>
    <height>1080</height>
    <width>1920</width>
  </file>
  <file name="ddr-densho-1000-210-1-mezzanine-a709bc73aa.mp4" source="derivative">
    <format>h.264</format>
    <original>ddr-densho-1000-210-1-mezzanine-a709bc73aa.mpg</original>
    <mtime>1471584145</mtime>
    <size>18975948</size>
    <md5>a1207cd2aade05700b0c14ffbbe68c57</md5>
    <crc32>594de7f8</crc32>
    <sha1>f58429ee972d18d0db57327c78886727e1b906cd</sha1>
    <length>181.58</length>
    <height>360</height>
    <width>640</width>
  </file>
  <file name="ddr-densho-1000-210-1-mezzanine-a709bc73aa.ogv" source="derivative">
    ...
  </file>
  <file name="ddr-densho-1000-210-1-mezzanine-a709bc73aa.mp3" source="derivative">
    ...
  </file>
  <file name="ddr-densho-1000-210-1-mezzanine-a709bc73aa.png" source="derivative">
    ...
  </file>
  <file name="ddr-densho-1000-210-1.thumbs/ddr-densho-1000-210-1-mezzanine-a709bc73aa_000001.jpg" source="derivative">
    ...
  </file>
  ...
</files>

The DDR extracts info from the XML and includes selected parts of it when publishing to Elasticsearch:

...
ia_meta: {
  id: "ddr-densho-1000-210-1",
  original: "ddr-densho-1000-210-1-mezzanine-a709bc73aa.mpg",
  mimetype: "video/mpeg",
  xml_url: "https://archive.org/download/ddr-densho-1000-210-1/ddr-densho-1000-210-1_files.xml",
  http_status: 200,
  files: {
    mpg: {
      mimetype: "video/mpeg",
      sha1: "a709bc73aad02447a17b9975abb5e1a19d98e4ed",
      name: "ddr-densho-1000-210-1-mezzanine-a709bc73aa.mpg",
      encoding: null,
      url: "https://archive.org/download/ddr-densho-1000-210-1/ddr-densho-1000-210-1-mezzanine-a709bc73aa.mpg",
      format: "mpg",
      height: "1080",
      width: "1920",
      length: "181.54",
      title: "",
      size: "813281284"
    },
    mp4: { ... },
    ogv: { ... }, 
    mp3: { ... }, 
    png: { ... }, 
...

This info is used in ddr-public to in segment templates segment.html

            <a href="{{ segment.ia_meta.files.mp4.url }}" class="btn btn-default btn-xs">
              <i class="fa fa-download"></i>
              Download MP4
              ({{ segment.ia_meta.files.mp4.size|filesizeformat }})
            </a>
            <a href="{{ segment.ia_meta.files.mpg.url }}" class="btn btn-default btn-xs">
              <i class="fa fa-download"></i>
              Download full-size MPEG2
              ({{ segment.ia_meta.files.mpg.size|filesizeformat }})
            </a>

segment-audio.html

  ...
  wavesurfer.load('{{ object.ia_meta.files.mp3.url }}');
</script>
{% endblock javascript %}

segment-video.html

<video id="clip" class="embed-responsive embed-responsive-16by9 video-js vjs-big-play-centered vjs-fluid" controls preload="auto" width="560" height="384" poster="{{ segment.links.img }}" data-setup="{}">
  <source type="video/mp4" src="{{ segment.ia_meta.files.mp4.url }}">
  ...
gjost commented 5 years ago

Instead of making a whole new ddrindex command and spending time figuring out how to transcode video into the IA formats, just point unpublished objects at selected dummy objects on IA: audio: https://archive.org/download/ddr-csujad-28-1 video: https://archive.org/download/ddr-densho-1000-28-1

gjost commented 5 years ago

Fixed in commit 4cef203. Unpublished or missing objects are pointed at dummy IA records.