LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.
https://childproject.readthedocs.io
MIT License
13 stars 5 forks source link

Implement processor to split LENA recordings #326

Open lucasgautheron opened 2 years ago

lucasgautheron commented 2 years ago

Is your feature request related to a problem? Please describe.

Users may want to split LENA recordings into contiguous blocks, as in EL1000. This involves splitting the recordings in the metadata and splitting the audio accordingly.

Describe the solution you'd like

Implement a processor (in pipelines.processors)

MarvinLvn commented 2 years ago

Ideally : 1) Existing metadata and annotations (vtc, lena, etc) should be cut accordingly. 2) recordings.csv shouldn't be erased. I think having a sessions.csv instead would be useful (you may want to work on your dataset at the longform level, or at the session-level)

lucasgautheron commented 2 years ago

Ideally :

  1. Existing metadata and annotations (vtc, lena, etc) should be cut accordingly.
  2. recordings.csv shouldn't be erased. I think having a sessions.csv instead would be useful (you may want to work on your dataset at the longform level, or at the session-level)
  1. Oh yeah, this one is going to be PITA too. What if you want to re-import annotations at a later stage? Need to think about this...
  2. Currently, this is done by groupby by session_id. For instance, some of ChildProject's features (like sampling) already allow the user to decide which level to work at. Do we need a separate metadata file for that?
MarvinLvn commented 2 years ago
  1. That seems like a huge amount of work
  2. Agree, it's the simplest approach