Open DonRichards opened 4 years ago
@DonRichards Thanks for reporting. Next step is for me to reproduce. Thanks for your patience while I try to work this task into my work schedule.
@MarcusBarnes Are you able to reproduce the error?
Ping @MarcusBarnes
@DonRichards I've been away. I'll look into this this week. Would you please clarify how the VTT was edited?
Here's an example of a transcript that is failing. It works upon ingest but it fails when the editor is used. test.vtt.txt
One discripency you might have noticed between the screenshot and the vtt file is the closing </v>
tag. I've tried it both ways with no luck.
I tried regenerating the INDEXTRANSCRIPT file but it creates a blank (47 B) file.
@DonRichards Would you please confirm that WEBVTT was used for the transcript datastream when creating the initial oral history object? That is, you did not use transcript XML for the transcript datastream and then have WebVTT generated from the transcript XML?
@DonRichards I was able to reproduce the behaviour you reported. I've labeled this as a bug. I'll note that the screenshot you shared in https://github.com/Islandora-Labs/islandora_solution_pack_oralhistories/issues/151#issuecomment-545018452 is not the default that ships with the solution pack, but that the issue is not related to that customization.
I uploaded a WebVTT file as the transcript when I ingested the object. Sorry, I wrote this and didn't click the green button. >:-|
@DonRichards Thank you for confirming.
@DonRichards For the example object above, please grab the text file below, remove the .txt extension (so that the file name and extension is unixlf.vtt
), and then replace the TRANSCRIPT datastream with this file via the manage datastreams interface. Please do not otherwise open or edit the file.
After the TRANSCRIPT datastream has been replaced, click the regenerate operation for the INDEXTRANSCRIPT datastream.
Please let me know if you get the 47 B file (as per https://github.com/Islandora-Labs/islandora_solution_pack_oralhistories/issues/151#issuecomment-545022028) for the INDEXTRANSCRIPT datastream or not.
Doing those steps does fix the issue.
Is this to identify if ant \r \n
characters are the issue?
@DonRichards Correct. It seems that the parse_vtt
function is breaking on the CR \r
characters.
@MarcusBarnes I wonder why the module is generating a \r
character instead of the typical \n
. It should be easy enough to sanitize this.
@MarcusBarnes What was the steps you took to strip out those characters? I've ran a few tests (replacing \n
with \r
and tried \r\n
) with no luck.
@DonRichards I opened the sample VTT you provided in my text editor. My text editor (currently BBEdit) has the option of changing line ending characters from Windows (CRLF) to Unix (LF). If you're working on Windows, Notepad++ provides similar functionality. After changing the line ending settings, I saved.
I got it. Thanks. For the sake of prosperity for others if they come across this issue before it gets resolved I think the fix is easy enough. Steps to work around this issue
$ dos2unix -ic view.vtt | xargs dos2unix
The IDE solution works as well. Sorry, should have made that comment as well. Command line solutions avoid the IDE configuration craziness (like working with ATOM vs notepad++). I hope this helps.
When first uploaded the transcripts were correct. After editing the file it creates empty INDEXTRANSCRIPT files. Regenerating INDEXTRANSCRIPT also results in an (47 B) empty file.
Here is the transcript. I've tried to remove any special characters but it still seems broken.