Forced-Alignment-and-Vowel-Extraction / alignedTextGrid

aligned-textgrid links textgrid sequences together
https://forced-alignment-and-vowel-extraction.github.io/alignedTextGrid/
GNU General Public License v3.0
6 stars 1 forks source link

[Feature Request]: TextGridGroup for splitting and merging textgrids #167

Open chrisbrickhouse opened 8 months ago

chrisbrickhouse commented 8 months ago

What feature would you like added?

Let's say you have an audio file with two speakers and for some reason you need to split the audio into separate files for each speaker. So a folder like:

ParentFileName_Speaker01_part1.wav
ParentFileName_Speaker01_part2.wav
ParentFileName_Speaker02_part1.wav

The textgrids can be used to make these splits, but you also need to make textgrids for the new audio files. These new sub-grids lose their precedence relations across files. Depending on the split scheme (say, split by word rather than turn) that can be a negative. Another similar context could be if there is one aligned textgrid for word and phones, and a second for turns and affective coding. Having those hierarchical relations preserved across the files would be useful.

We could implement this as storing references across instances, so two Words, A and B where A.fol == B. From a structural standpoint, we could say that an alignedTextGrid can contain TierGroups or other text grids; a TextGridGroup

What would the use case be for this feature?

Splitting and merging text grids are what motivated me to propose this, but I could imagine it being useful for workflows where you want to keep phrase-level transcripts or coding out of the phonetic alignment file.

Would you like to help add this feature?

Yes, and I will submit a pull request soon.

Code of Conduct

JoFrhwld commented 7 months ago

Some thoughts:

chrisbrickhouse commented 7 months ago

I think I'm okay moth-balling this idea for right now. In practice, I've found a work-around for my use case so it's not urgent for me, but I think there's still something here long term.

The top level idea is that an ATG instance/file could operate as an abstraction of the file(s) that comprise it. From a logical stand point, there's no reason that Speaker1.atg needs to be composed from only Speaker1.tg and Speaker1.wav. It is possible for Speaker1.atg to be the composition of multiple textgrids and audio files, and the atg stores how they all relate. You can chop the output of an aligner however you want, and the ATG instance will keep track of it all under the hood as if it were still one text grid.

The benefit of the class-based approach to textgrid components is that an ATG could be composed from multiple files that each have a piece of the data. So a TierGroup could be SequenceTiers of Words and Phones (i.e., an aligner output), but another textgrid with the phrase-level transcriptions (i.e., an aligner input) could be imported as a SequenceTier and added to the TierGroup without the need to edit the underlying text grids or output a new one that can get mixed up with all the others that have similar names an extensions. This also gets around the .within and .contains issue, I think, because it would keep the TierGroup as highest level.

So, those are some thoughts on the concept. I don't think implementing that is a priority at the moment because I don't have a use for it, and the sideways relationships wound up not being a problem for what I'm doing.