Implement processor to split LENA recordings

LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.

https://childproject.readthedocs.io

MIT License

13 stars 5 forks source link

Implement processor to split LENA recordings #326

Open lucasgautheron opened 2 years ago

lucasgautheron commented 2 years ago

Is your feature request related to a problem? Please describe.

Users may want to split LENA recordings into contiguous blocks, as in EL1000. This involves splitting the recordings in the metadata and splitting the audio accordingly.

Describe the solution you'd like

Implement a processor (in pipelines.processors)

fill lena_recording_num
set date_iso and start_time properly for each block (increment the original date by the correct amount for each block)
set session_id and session_offset (what if they already exist?)
Should work when the dataset also has non-LENA recordings
Should be idempotent and work when some of the recordings have already been split

MarvinLvn commented 2 years ago

Ideally : 1) Existing metadata and annotations (vtc, lena, etc) should be cut accordingly. 2) recordings.csv shouldn't be erased. I think having a sessions.csv instead would be useful (you may want to work on your dataset at the longform level, or at the session-level)

lucasgautheron commented 2 years ago

Ideally :

Existing metadata and annotations (vtc, lena, etc) should be cut accordingly.

recordings.csv shouldn't be erased. I think having a sessions.csv instead would be useful (you may want to work on your dataset at the longform level, or at the session-level)

Oh yeah, this one is going to be PITA too. What if you want to re-import annotations at a later stage? Need to think about this...
Currently, this is done by groupby by session_id. For instance, some of ChildProject's features (like sampling) already allow the user to decide which level to work at. Do we need a separate metadata file for that?

MarvinLvn commented 2 years ago

That seems like a huge amount of work
Agree, it's the simplest approach