Forced-Alignment-and-Vowel-Extraction / alignedTextGrid

aligned-textgrid links textgrid sequences together
https://forced-alignment-and-vowel-extraction.github.io/alignedTextGrid/
GNU General Public License v3.0
6 stars 1 forks source link

[Feature Request]: SequenceInterval splitting #171

Open JoFrhwld opened 7 months ago

JoFrhwld commented 7 months ago

What feature would you like added?

Right now, it is only possible to fuse intervals leftwards or rightwards, but impossible to split an interval. Thinking about how the SequenceInterval.split() could work:

Splitting

On timestamps

interval.split(at_times = [2.31, 2.35])

This should add new interval boundaries at the given times (with the interval's start and end time implicit). This example would result in 3 intervals.

On percentage time

interval.split(at_proportion = [0.2, 0.7]

This would place boundaries at 20% and 70% of the duration of the interval, resulting in 3 intervals.

On the subset

interval.split(on_subset = True)

This should, perhaps, be the default behavior. This would split the interval into sub-intervals based on the timestamps of its subset intervals.

Labelling

Explicit labels

interval.split(
  at_proportion = [0.2, 0.7],
  labels = ["a", "b", "c"]
)

Label Fun

def label_sequential(label, sequence_len):
  label_rep = label * sequence_len
  labels = [
    f"{lab}-{num}" 
    for lab, num in zip(label_rep, range(sequence_len))
  ]
  return labels

def label_rep(label, sequence_len):
  label_rep = label * sequence_len
  return label_rep

def label_blank(label, sequence_len):
  label_rep = "" * sequence_len
  return label_rep

interval.split(
  at_proportion = [0.2, 0.7],
  label_fun = sequential_number
)

What would the use case be for this feature?

When creating new-sub-interval tiers based in analytical landmarks. E.g.

Would you like to help add this feature?

Yes, and I will submit a pull request soon.

Code of Conduct

chrisbrickhouse commented 7 months ago

@JoFrhwld Before you get deep in the weeds on this, I already have some code that does something similar in a project I've yet to push. I'm at a different machine today though, so I'll have to get the demo to you later this evening.

chrisbrickhouse commented 7 months ago

See the TextGrid class in this repo for some example code. It's very quick and dirty, so there are probably ways to optimize the splitting algorithm for things like phrases, small pauses, priority tier groups, etc, but it does a pretty good job of isolating phrases from a word/phone grid.

JoFrhwld commented 7 months ago

That's real cool! I don't think I'd implement anything as particular as logic for splitting a phrase into subphrases. More like just convenience functions for any given kind of splitting.