Closed lucasgautheron closed 3 years ago
Feel free to suggest more @alecristia
Hmm I'm not sure I follow the logic of these. What would be the goal? Some of these are for me things that must be done to correct errors:
clip(segments, start, stop): clip every interval in segments from start to stop, dropping segments that are out of bounds. <-- I can only imagine wanting to do that because human annotators sometimes incorrectly start/stop segments
fill_silences(segments, silence_speaker_type = 'SIL'): populate the dataframe segments with intervals for every silence, setting silence_speaker_type as the speaker_type for these intervals <-- is the idea of this that sections that were supposed to be annotated, when they have no segments, we need to infer that this is silence? Then shouldn't this be done automatically (ie as part of the annotation cleaning)?
The other function is something that I do at the analysis - I'll explain
sorry, i was interrupted -- "intersection(a, b)" is what I typically do in R, with left_join or merge. We don't need to rewrite this function in a different package. Typically this is at the analysis stage, when you decide which metadata you need and how you want to integrate it.
Does that make sense, or am I perhaps misunderstanding what we are trying to do here?
I suggest that we discuss here a list of functions that could help manipulating annotations and segments more easily. Here is what I suggest, please add any :
get_segments(annotations)
: returns segments corresponding to a list of annotations, as one single merged dataframe.annotations
: a dataframe containing the annotations for which the segments should be retrieved.intersection(a, b)
: compute the intersection of two sets of annotationsa
: a dataframe containing annotations as in the meta data, e.g. :b
: a dataframe containing annotations as in the meta data, e.g.:Returns : a tuple of two dataframes, e.g.:
and
clip(segments, start, stop)
: clip every interval insegments
fromstart
tostop
, dropping segments that are out of bounds.fill_silences(segments, silence_speaker_type = 'SIL')
: populate the dataframe segments with intervals for every silence, settingsilence_speaker_type
as the speaker_type for these intervals