Islandora-Labs / islandora_solution_pack_oralhistories

Adds all required Fedora objects to allow users to ingest and retrieve Oral Histories (video/audio) files through the Islandora interface
GNU General Public License v3.0
13 stars 23 forks source link

Improve the vtt parser by using https://github.com/mantas-done/subtitles #82

Closed Natkeeran closed 7 years ago

Natkeeran commented 7 years ago

What does this Pull Request do?

This PR addresses this issue: https://github.com/digitalutsc/islandora_solution_pack_oralhistories/issues/81 INDEXMEDIATRACK parses the vtt to create the xml to index. This bug is due to a bug in the parser. It did not handle multi line VTT's.

What's new?

We have adopted the https://github.com/mantas-done/subtitles/blob/master/src/code/Converters/VttConverter.php as the vtt parser. This resolves the multi-line issue and overall improves the parser.

How should this be tested?

Additional Notes:

Thanks @kimpham54 for finding this major bug.

kimpham54 commented 7 years ago

@Natkeeran doesn't seem to work for me when I have an existing oral history with multi lines, where the INDEXMEDIATRACK is already there. When I update the transcript through the UI, I get the following error:

Notice: Undefined offset: 1 in VttConverter::vttTimeToInternal() (line 61 of /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
Notice: Undefined offset: 1 in VttConverter->fileContentToInternalFormat() (line 26 of /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
Notice: Undefined offset: 1 in VttConverter::vttTimeToInternal() (line 61 of /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
Notice: Undefined offset: 1 in VttConverter::vttTimeToInternal() (line 61 of /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
Notice: Undefined offset: 1 in VttConverter->fileContentToInternalFormat() (line 26 of /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
Notice: Undefined offset: 1 in VttConverter::vttTimeToInternal() (line 61 of /var/www/drupal/sites/all/modules/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
Status message

But the TRANSCRIPT datastream is successfully updated. I added long long

longer

and this is what the TRANSCRIPT looks like:

<cue>
<speaker><![CDATA[Test Speaker One ]]></speaker>
<start><![CDATA[31]]></start>
<end><![CDATA[36]]></end>
<translation_en><![CDATA[It’s six o clock in the morning on the farm of Elton Woodside, near Clinton,
            Prince Edward Island. 
]]></translation_en>
<transcript><![CDATA[Il est six heures du matin dans la ferme d'Elton Woodside, près de Clinton, Île-du-Prince-Édouard. KIM KI

long
long

longer]]></transcript>
<translation_ch><![CDATA[早上六点钟在Elton Woodside的农场附近,靠近克林顿,爱德华王子岛。]]></translation_ch>

and this is what INDEXMEDIATRACK looks like:

<cue>
<start>31</start>
<end>36</end>
<speaker><![CDATA[Test Speaker One ]]></speaker>
<vtt_text><![CDATA[Il est six heures du matin dans la ferme d'Elton Woodside, près de Clinton, Île-du-Prince-Édouard. KIM KI]]></vtt_text>
</cue>
<cue>
<start>0</start>
<end>0</end>
<speaker><![CDATA[]]></speaker>
<vtt_text><![CDATA[long]]></vtt_text>
</cue>
<cue>
<start>0</start>
<end>0</end>
<speaker><![CDATA[]]></speaker>
<vtt_text><![CDATA[longer]]></vtt_text>
</cue>
<cue>

extra cues are created, not all lines are saved

kimpham54 commented 7 years ago

the issue here is that through transcripts_ui you can create invalid webvtt with multiple lines with spaces in between.

kimpham54 commented 7 years ago

works as expected. thank you