MontrealCorpusTools / PolyglotDB

Language data store and linguistic query API
MIT License
36 stars 13 forks source link

Empty intervals on textgrid tiers not parsed as <SIL> #164

Closed soskuthy closed 2 years ago

soskuthy commented 2 years ago

When importing textgrid corpora (using either the textgrid or the MFA parser), empty intervals are simply skipped during import, not replaced with \<SIL>. This then makes it impossible to enrich the corpus with e.g. utterances, which rely on these intervals.

I've traced the issue praatio: when using tgio.openTextgrid, the keyword argument readRaw=True should be added. I guess this is because praatio has been updated? So the issue could be fixed either by adding the argument in the code for the textgrid parser or by forcing the installation of an earlier version of praatio when setting up polyglotDB.

james-tanner commented 2 years ago

I see that this issue has been addressed in the change to praatIO 5.0 and now includes a includeEmptyIntervals argument to openTextgrid. It may be worth upgrading praatIO in Polyglot but would involve some (small) code changes to account for the renaming of classes. Alternative is to do as you've said and use readRaw=True in praatIO < 5.0.

@mmcauliffe Which would be preferable?

mmcauliffe commented 2 years ago

Updated it to use the newer version of praatIO, so it should be good now.