WGBH-MLA / openvault3

Apache License 2.0
2 stars 3 forks source link

BUG: ingesting records with *only* MP4 video in S3 bucket will fail #743

Open afred opened 4 years ago

afred commented 4 years ago

Background

  1. It all started long ago, when we were having trouble with video playback across browsers.

  2. IIRC, Firefox did not like our .mp4 files.

  3. Our solution was to create .webm files and put them in S3 alongside the .mp4 files.

  4. Then, when we're rending <video> elements, we'd put both URLs in as <source ...> elements, where if one was missing, it would just gracefully fall back to the other.

  5. At some point (maybe from the beginning of this change) PBCore#proxy_srcs was set to just return both URLs.

  6. Buuuuut.......... we didn't fix the logic in ValidatedPBCore#url_validate, which tries to ensure that all URLs exist in S3.

  7. So the logic that handles ValidatedPBCore#proxy_srcs is fundamentally different when rendering vs. when validating: in the former, you only need 1 of the URLs to work; in the latter you need all of the URLs to work.

  8. The ingester performs some validation that checks the PBCore model's "url methods" for external URL, and then performs a HEAD request to make sure those files are available.

  9. But... for videos, the URLs it's checking have both a .webm and .mp4 file extension (in that order).

  10. Thus... when the validator tries to HEAD the URL with the .webm file extension, it fails because only the .mp4 file exists.

Solution

There's so much opportunity for re-factoring here (like replacing the whole PBCore model layer)... but.... a short-term fix probably looks something like making the distinction in ValidatedPBCore#url_validate between URLs that all must be present, and URLs for which any can be present.

Done when

Ingest does not fail for video records having only a .mp4 file in S3