Closed adarshp closed 3 years ago
This bug occurs because the script crudely tries to extract events from every file in the directory passed in. A simple change would be to only look at files with a *.vtt extension, but there still needs to be some logic to pair transcripts with metadata files. I'll push the simple change before adding the pairing logic.
@pelovett Sounds good, thanks!
This should be solved by this commit: 3423acdc28cb9fbf604bf70ff352b29d05c14e30
The linked commit introduces a separate scala app for parsing multiple transcripts. The issue of how to get relevant metadata has been split into a separate issue (#3)
Hi @pelovett , I'm getting an error when running ExtractDirSearch. See below for the invocation and the errors. The directory
/Users/adarsh/git/clulab/tomcat-text/data/study-1_2020.08
contains all theHSR*.vtt
,HSR*.tsv
, andHSR*.metadata
files from GCS.