LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.
https://childproject.readthedocs.io
MIT License
13 stars 5 forks source link

Units in time-aggregated metrics #303

Closed orasanen closed 2 years ago

orasanen commented 2 years ago

Just to make sure I understand the time-aggregated metrics: is the output (e.g., vtc female adult speech duration) always normalized as "seconds per one hour of input" vocalization rate independently of the period parameter used in the analysis? _ph in the variable name implies "per hour", as do the numbers I get (although sometimes I get more than 60 mins per hour for 5min resolution analysis). However, the documentation at https://childproject.readthedocs.io/en/latest/metrics.html#aclew-metrics is not perfectly unambiguous on whether the rate is seconds per hour, or seconds per set period. Thanks!

lucasgautheron commented 2 years ago

Just to make sure I understand the time-aggregated metrics: is the output (e.g., vtc female adult speech duration) always normalized as "seconds per one hour of input" vocalization rate independently of the period parameter used in the analysis? _ph in the variable name implies "per hour", as do the numbers I get (although sometimes I get more than 60 mins per hour for 5min resolution analysis). However, the documentation at https://childproject.readthedocs.io/en/latest/metrics.html#aclew-metrics is not perfectly unambiguous on whether the rate is seconds per hour, or seconds per set period. Thanks!

Yes, speech rates are per hour; (counts per hour and seconds per hour). See here for the details of the implementation: https://github.com/LAAC-LSCP/ChildProject/blob/35ebacb9dbe28e1a4bbc985668c4ce03e7ea4110/ChildProject/pipelines/metrics.py#L591-L604 I'm going to update the docs accordingly.

Regarding as to why the rate may exceed 1 hour of speech per hour, I could see reasons why this may happen, but I think it should be very rare...

For now, the algorithm assigns a single time bin to each speech segment (~vocalization), even though a vocalization could span across several bins (if the bins are very short). But then the whole duration of the segment is attributed to this bin. I expect this simplification to have a negligible impact for windows >= 1 minute, but, it may not be. The behavior of the algorithm can and should be refined in the case of very short time bins (by splitting segments across several bins), but thought I'd keep this for later (since it's probably negligible in general for periods >> 1s, the typical length of a vocalization) This could also be a problem for bins only partially covered. For instance, if the first 5 minutes bin is only covered by 5 seconds of annotations, and that a 10s segment has its onset within this bin, the calculated rate will be 2 hours per hour. One way to avoid that is to discard bins that are only very partially covered.

It can also happen if a speaker class has overlapping segments. But I guess this is not your case, since you are probably using vtc/lena annotations. And of course, this could be a bug!

Can you share the command that you used here so that I can reproduce and provide a more satisfying answer? (Including the name of the dataset)

Thanks!

orasanen commented 2 years ago

Thanks for the clarification!

In case you want to look into it, I'm getting a total of 68 minutes for bergelson set of aclew10kq1 dataset with 10 min period on VTC output, so it should not have overlapping speech in that case. Row of the resulting data file is 44166.

(ChildProject) rasaneno@wks-86206-mac bergelson % child-project metrics . output.csv period --period 10Min --set vtc --threads 8 (ChildProject) rasaneno@wks-86206-mac bergelson % head -44166 output.csv | tail -1 123848-7339_1,654.0,2314.2239999998565,3.53856880733923,624.0,1780.5359999999637,2.853423076923019,18.0,4.3800000000192085,0.24333333333440046,0.0,0.0,,600000,32,16:40:00

The voc_dur_fem_ph and voc_mal_dur_ph total to 4.1k seconds, or around 68 minutes.

lucasgautheron commented 2 years ago

The fact that fem+mal add up to more than one hour per hour seems normal to me with the VTC the VTC does not have overlap /within/ a class (it cannot distinguish two speakers from the same class), but FEM and MAL can be both active at the same time (unlike with the LENA)

Checked individually, FEM and MAL are below 1 hour per hour except in one case, which seems to match the situation I mentioned (a bin that is very partially covered).

>>> df['voc_dur_fem_ph'].max()
3382.313999999984
>>> df['voc_dur_mal_ph'].max()
3968.199204977618
>>> df['voc_dur_mal_ph'].max()
KeyboardInterrupt
>>> df[df['voc_dur_mal_ph'] > 3600]
      recording_filename  voc_fem_ph  voc_dur_fem_ph  avg_voc_dur_fem  voc_mal_ph  voc_dur_mal_ph  ...  voc_och_ph  voc_dur_och_ph  avg_voc_dur_och  duration  child_id    period
23252      123547-2405_1  900.022501       385.20963            0.428  900.022501     3968.199205  ...         0.0             0.0              NaN      3999        21  11:20:00

[1 rows x 16 columns]
orasanen commented 2 years ago

Oh right! I didn't remember that VTC has multiple parallel detectors instead of categorical decision among the possible target. Thanks for the clarification, and sorry for the hassle.