Closed McFateM closed 5 years ago
@McFateM We have updated the parser recently. Can you please provide the last commit of your local repo.
Are the vtt files passing the validation here: https://quuz.org/webvtt/?
If you can attach some sample VTT files, that will help narrow down the issue as well.
A ‘git log’ on my local reports this as the last change...
commit 88e90224821243b4da5d0bf169c9c6fd4e10e08c
Merge: 732701f fbc37b4
Author: kim pham kimpham54@users.noreply.github.com
Date: Mon Jun 5 15:19:11 2017 -0400
Merge pull request #77 from digitalutsc/issue_75
ensure to put CDATA for escape characters for INDEXMEDIATRACK
The latest .vtt I have does validate to the standard but I can’t share it here without first obfuscating some names (it’s not public yet) and I fear those edits might also alter the line endings.
I’m going to pull the latest 7.x code and see what it might do. I see there have been VTT-related changes committed lately. I’ll let you know how it goes.
As always, thanks for the quick response!
-Mark M.
From: Natkeeran notifications@github.com<mailto:notifications@github.com> Reply-To: digitalutsc/islandora_solution_pack_oralhistories reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, June 13, 2017 at 11:40 AM To: digitalutsc/islandora_solution_pack_oralhistories islandora_solution_pack_oralhistories@noreply.github.com<mailto:islandora_solution_pack_oralhistories@noreply.github.com> Cc: Mark McFate mcfatem@grinnell.edu<mailto:mcfatem@grinnell.edu>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [digitalutsc/islandora_solution_pack_oralhistories] Cannot parse WebVTT file produced by InqScribe on a Mac (#92)
@McFateMhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mcfatem&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=YGN3NwBaNtSHZOkD26iG9ygR3ECNB_lldC6Re119b4k&s=fpl4B3VZWoAVFBryFq7G8Xbm6n2_OrAw8dl7ejro9E8&e= We have updated the parser recently. Can you please provide the last commit of your local repo.
Are the vtt files passing the validation here: https://quuz.org/webvtt/https://urldefense.proofpoint.com/v2/url?u=https-3A__quuz.org_webvtt_&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=YGN3NwBaNtSHZOkD26iG9ygR3ECNB_lldC6Re119b4k&s=Pwjy7P-_twPrM5v7uih1Yzxck4DkCbXAExmTYwFsqnE&e=?
If you can attach some sample VTT files, that will help narrow down the issue as well.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_digitalutsc_islandora-5Fsolution-5Fpack-5Foralhistories_issues_92-23issuecomment-2D308176571&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=YGN3NwBaNtSHZOkD26iG9ygR3ECNB_lldC6Re119b4k&s=7mesqBd4MmdZB4Hba6fJKMQgUmNgCePec8UF_El-DK4&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIFIwU6YUlk47iVFySp7X5Sd5Ls0Uxp5ks5sDruagaJpZM4N4s9M&d=DwMFaQ&c=HUrdOLg_tCr0UMeDjWLBOM9lLDRpsndbROGxEKQRFzk&r=PQglHQe-EzyZqJOuOVcmU0OZ6bg-89msSPuqyNlQr28&m=YGN3NwBaNtSHZOkD26iG9ygR3ECNB_lldC6Re119b4k&s=J4YOdFEmi0AJ05fFeao7l0IxcZT5SLCSkjlks1bAd_c&e=.
Correction: The last (first) entry in my git log is:
commit d04f0a78ad0889ae1201af7e51e9c720f27c978d Merge: d1f61a3 70a768d Author: Marcus Emmanuel Barnes MarcusBarnes@users.noreply.github.com Date: Wed Jun 7 09:54:40 2017 -0400
Merge pull request #76 from digitalutsc/issue_34
Redirects to the OH object after editing transcript XML through the manage datastream interface.
So I pulled the latest code and I see the new VTT parser, but it appears to suffer from the same issues as before. Specifically I get this back when ingesting one of my 'valid' VTTs...
Notice: Undefined offset: 1 in VttConverter->fileContentToInternalFormat() (line 22 of /var/www/drupal7/sites/default/modules/contrib/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
Notice: Undefined offset: 1 in VttConverter->fileContentToInternalFormat() (line 22 of /var/www/drupal7/sites/default/modules/contrib/islandora_solution_pack_oralhistories/includes/lib/VttConverter.php).
The problem, again, appears to be with line endings and perhaps a few other things that are 'optional', but still valid, in the VTT specification.
I'm going to introduce my stashed changes to what is now public function fileContentToInternalFormat($file_content) and see if I can work past this.
Thanks.
@McFateM - we're exploring this issue in preparation for the 7x-1.10 compatible release of this module. Do you mind creating a pull request for us to review? Thank you!
@McFateM Were you able to get this working? Are you able to provide your solution and possibly a sample of the valid VTT file that was not being handled adequately? Thank you in advance.
@McFateM Some changes have been made to the VTT parser. Would you please confirm whether this issue still exits for you as of commit https://github.com/Islandora-Labs/islandora_solution_pack_oralhistories/commit/65812f4f5067d9ed927bbe78d9fc01902293fef4? Thanks in advance.
Sorry @MarcusBarnes, I can't easily test this change because we stopped using the VTTs and found a way around this shortly after this issue was posted. So I don't have any VTT files to check this with, and both of our InqScribe licenses are in use by others for the foreseeable future.
Thanks @McFateM for the update. I'll close the issue and we can reopen it if others encounter similar challenges going forward. I suspect that the change to the VTT parser may address the issue you previously reported (but this would need to be tested and confirmed).
We ran into a similar issue when ingesting VTT files that had been created on Windows. After investigating, it looks like the parser only accepts transcript files with Unix style line endings.
Maybe it would be useful to update the parser so that it's more tolerant to non-Unix file formats?
@MarcusBarnes do you want to reopen this, or should I create another enhancement request?
@timtomch. Regarding your comment https://github.com/Islandora-Labs/islandora_solution_pack_oralhistories/issues/92#issuecomment-477769399 I'm inclined to make this a documentation issue - explicitly stating that VTT files should have Unix style line endings. Do you know what program was used to make the VTT files on Windows? For example, Notepad++ allows you to set the line endings. Would you be able to attach or send me an example VTT that failed for you? After confirming the behaviour (on a *nix environment), I can create an enhancement issue.
Hi @MarcusBarnes. That's fine with me. You can use this file for testing. It's the "flying farmer" sample VTT file from the OH testing objects repo with the line endings converted to Windows style.
We have produced a few WebVTT files exported from InqScribe running on a Mac and the line endings (typically \r\n in my case) don't appear to be compatible with the module's parse_vtt( ) function. I am actively debugging this and attempting to make that parser more generic so I wanted to get this issue in the queue so that I have something to document changes against.