Islandora-Labs / islandora_solution_pack_oralhistories

Adds all required Fedora objects to allow users to ingest and retrieve Oral Histories (video/audio) files through the Islandora interface
GNU General Public License v3.0
13 stars 23 forks source link

Transcript viewer doesn't display time cues for media longer than 1hour #147

Closed timtomch closed 4 years ago

timtomch commented 5 years ago

It looks like transcripts for media longer than 1 hour are not correctly interpreted by the transcript viewer. When timestamps are in the format HH:MM:SS.DDD, the following issues are observed

islandora

The VTT file starts with

WEBVTT

00:00:16.867 --> 00:00:18.666
( music playing )

00:00:43.359 --> 00:00:48.863
When I lied on the application,

00:00:48.865 --> 00:00:50.999
I just thought I was doing

This test video and VTT transcript file can be used to replicate this issue.

timtomch commented 4 years ago

Was someone else able to reproduce this? I'm guessing that long media files might not be uncommon, no? Ping @MarcusBarnes, or should I get in touch with someone else? Thanks!

MarcusBarnes commented 4 years ago

@timtomch My initial instinct is to look at includes/lib/VttConverter.php and see if it's handling things adequately. I haven't heard reports about this from elsewhere. I haven't personally tested but will try to do so time permitting. Thanks for the ping.

timtomch commented 4 years ago

Hi @MarcusBarnes thanks for the hint. I've come back to this, and I played around with a few test files. I don't think the issue is with VttConverter.php because I'm able to read cues correctly, and I'm also able to generate a cues XML by running this snippet.

However I now see that the INDEXTRANSCRIPT derivative that is generated upon ingest might be at fault. Here's the XML I'm generating (using the test objects referenced in my initial post on this issue):

<?xml version="1.0" encoding="utf-8"?>
<cues>
  <cue>
    <start>16.867</start><end>18.666</end>
    <speaker><![CDATA[]]></speaker><vtt_text><![CDATA[( music playing
)]]></vtt_text>
  </cue>
  <cue>
    <start>43.359</start><end>48.863</end><speaker><![CDATA[]]></speaker><vtt_text><![CDATA[When I lied on the
application,]]></vtt_text>
  </cue>
  ...

And here's the INDEXTRANSCRIPT content:

<?xml version="1.0" encoding="utf-8"?>
<cues>
   <cue>
      <start>0.867</start><end>0.666</end><speaker><![CDATA[]]></speaker><vtt_text><![CDATA[( music playing )]]></vtt_text>
   </cue>
   <cue>
      <start>0.359</start><end>0.863</end><speaker><![CDATA[]]></speaker><vtt_text><![CDATA[When I lied on the application,]]></vtt_text>
   </cue>

As you can see, it looks like the seconds are being trimmed here.

I'm unclear what part of the module is being run to generate INDEXTRANSCRIPT, can you help me figure this out?

MarcusBarnes commented 4 years ago

@Natkeeran Based on @timtomch lastest observation, does this suggests that a fix to the issue may be sought by adjusting XSLTs? See: https://github.com/Islandora-Labs/islandora_solution_pack_oralhistories/tree/7.x/xsl/

timtomch commented 4 years ago

Thank you @MarcusBarnes. I tried running my "clean" XML against either transforms in the xsl directory, and they don't seem to be the culprit for clipping the time cues. @Natkeeran can you point me to the process that generates the INDEXTRANSCRIPT derivative?

timtomch commented 4 years ago

Update on this: I cannot replicate the issue on a fresh Islandora 7 install (ISLE), where the solution pack behaves correctly without truncating the time cues. The issue must therefore be with local configuration/conflict. I would appreciate help in resolving it, but since this does not appear to be a bug with the module, I will close this issue now.

Natkeeran commented 4 years ago

@timtomch

Sorry missed this one.

INDEXTRANSCRIPT transcript gets created here: https://github.com/Islandora-Labs/islandora_solution_pack_oralhistories/blob/7.x/includes/derivatives.inc#L113

timtomch commented 4 years ago

So to my eternal shame, it would appear that this issue had in fact been addressed by 226b09. For some reason the repo wasn't syncing on our production server. Pulling the latest version resolved the issue. So sorry for the trouble...