BEP for audio/video capture of behaving subjects

bendichter commented 7 months ago

I would like to create a BEP to store the audio and/or video recordings of behaving subjects.

While this would obviously be problematic for sharing human data, it would be useful to internal human data and for internal and shared data of non-human subjects.

Following the structure of the Task Events we will define types of files that can be placed in various data_type directories.

sub-<label>/[ses-<label>]
    <data_type>/
        <matches>_behcapture.mp3|.wav|.mp4|.mkv|.avi
        <matches>_behcapture.json

This schema will follow the standard principles of BIDS, listed here for clarity:

If no relevant exists, use beh/.
Video or audio files that are continuous recordings split into files will use the _split- entity.
Video or audio files that are recorded simultaneously but from different angles or at different locations would use the _recording- entity to differentiate. We will need to modify the definition of this term to generalize it a bit to accommodate this usage. This entity would also be used to differentiate if a video and audio were recorded simultaneously but from different devices. Not that simply using the file extension to differentiate would not work because it would not be clear which file the .json maps to.
The start time of each audio or video recording should be noted in the scans.tsv file.

The JSON would define "streams" which would define each stream in the file.

The *_beh.json would looks like this:

{
  "device": "Field Recorder X200",
  "streams": [
    {
      "type": "audio",
      "sampling_rate": 44100.0,
      "description": "High-quality stereo audio stream."
    },
    {
      "type": "video",
      "sampling_rate": 30.0,
      "description": "Standard 1080p video stream."
    }
  ]
}

To be specific, it would follow this JSON Schema structure:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "device": {
      "type": "string"
    },
    "streams": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type": {
            "type": "string",
            "enum": ["audio", "video"]
          },
          "sampling_rate": {
            "type": "number",
            "format": "float"
          },
          "description": {
            "type": "string"
          }
        },
        "required": ["type", "sampling_rate"],
        "additionalProperties": false
      }
    }
  },
  "required": ["device", "streams"],
  "additionalProperties": false
}

This BEP would be specifically for audio and/or video, and would not include related data like eye tracking, point tracking, pose estimation, or behavioral segmentation. All of these would be considered derived and are reserved for another BEP.

bendichter commented 7 months ago

cc @yarikoptic who is providing guidance on this concept.

bendichter commented 7 months ago

An alternative idea is to name the files "_video.mp4|avi|mkv|..." and "_audio.mp3|wav|...". The advantage of this is it may be more clear what these files are. The disadvantages are that this does not make it clear that it's a recording of the subject as opposed to a stimulus, and that it's not clear what you should do if you have an audio/video recording.

bendichter commented 7 months ago

Another alternative idea is to have the files called "_beh.mp3|.wav|.mp4|.mkv|.avi|...", though this conflicts with the current beh modality. If there is a beh.tsv file in the beh/ directory, then it will have an accompanying beh.json file, which would conflict with the json file that corresponds to the data (e.g. beh.mp3) file

Remi-Gau commented 7 months ago

see https://github.com/bids-standard/bids-specification/pull/750 for a PR that talks about supporting videos in BIDS (but for storing of stimuli)

Remi-Gau commented 7 months ago

This BEP would be specifically for audio and/or video, and would not include related data like eye tracking, point tracking, pose estimation, or behavioral segmentation. All of these would be considered derived and are reserved for another BEP.

Some of this may already be covered by the BIDS support for motion data and look at the the eyetracking BEP (PR and HTML)

Remi-Gau commented 7 months ago

tagging @gdevenyi who I think mentioned wanting to work on something like this last time I saw him.

VisLab commented 7 months ago

The ideas for allowing annotations of movies and audios as expressed in issue #153 could be expanded to allow annotation of participant video/audio but in the imaging directories themselves with appropriate file structure to distinguish. @neuromechanist @Remi-Gau @yarikoptic @adelavega @dungscout96 @dorahermes @arnodelorme .

Remi-Gau commented 7 months ago

I like how those different initiatives are synching up.

Wouldn't those annotations of videos using HED when experimenters "code" their video be more appropriate as a derivative though.

VisLab commented 7 months ago

Wouldn't those annotations of videos using HED when experimenters "code" their video be more appropriate as a derivative though.

Not necessarily.... in one group I worked with on experiments on stuttering -- the speech pathologist's annotations were definitely considered part of the original data. Most markers that you see in typical event files didn't come from the imaging equipment but are extracted from the control software or external devices. The eye trackers have algorithms to mark saccades and blinks and these are written as original data.

In my mind, if the annotations pertain to data that has been "calculated" from the original experimental data it should go into the derivatives folder. Annotations pertaining to data acquired during the experiment itself should probably go in the main folder.

Remi-Gau commented 7 months ago

I see I was more thinking of cases where videos of an animal behavior have to be annotated to code when certain behavior happened. Given this is not automated and can happen long time after data acquisition I would have seen this as more derivatives. But your examples show that the answer like in many cases will be "it depends".

gdevenyi commented 7 months ago

We have potential animal applications in both domains:

Video with annotation timestreams coming from automated touchscreen-based animal behaviour systems.
Videos of animals in classic "open field test" and similar setups where poostprocessing analysis collects a variety of annotations of the video determined by behaviour.

also I guess a:

Manual human annotation of videos of animals in naturalistic environments, like maternal care events

DimitriPapadopoulos commented 7 months ago

Would non-contiguous recordings (using the same setup) end up in the same or distinct files?

As an example, there could be cases where video recording has been stopped while taking care of a crying baby and resumed later on. Should BIDS try to enforce anything here, or leave it to end users (and data providers)?

What about other types of "time-series" data? Not sure about MEG, for EEG I know the EDF+ format allows discontinuous recordings:

EDF+ allows storage of several NON-CONTIGUOUS recordings into one file. This is the only incompatibility with EDF. All other features are EDF compatible. In fact, old EDF viewers still work and display EDF+ recordings as if they were continuous. Therefore, we recommend EDF+ files of EEG or PSG studies to be continuous if there are no good reasons for the opposite.

bendichter commented 7 months ago

@DimitriPapadopoulos I believe this would be different runs. You would specify the start time of each run in the scans file

yarikoptic commented 7 months ago

I think there might be multiple scenarios (entities) how it could be handled:

runs - if e.g. this corresponds also to separate runs of neural data if any acquired along, so primarily as "this is how we intended this all to be".
But I wonder if we should look into adopting/extending (currently they are too narrowly focused) any of other entities meaning of which relate somehow to have "pieces of" (using term which is not yet an entity): split, part, chunk.

neuromechanist commented 7 months ago

We have potential animal applications in both domains

From the annotation perspective in #153, an annot- entity enables multiple annotations per _media file. It might be useful here as well.

But I wonder if we should look into adopting/extending (currently they are too narrowly focused) any of the other entities meaning of which relate somehow to have "pieces of" (using term which is not yet an entity): split, part, chunk.

Any of them seems great, I currently suggested part- as an entity to use. But I can see any of the three work.

Video or audio files recorded simultaneously but from different angles or at different locations would use the _recording- entity to differentiate.

This is similar to having a stimulus with multiple tracks (left or right video streams, multiple audio channels, or separate video and audio), but they are not recording- per se. So, we might be able to look for a common entity that covers both potentially. We have two suggestions in #153 for now, (1) stream- and (2) track-. Would be happy to have any additional suggestions.

neuromechanist commented 7 months ago

Also, as @bendichter mentioned, this proposal will very soon find its audience in human neuroscience, especially with DeepLabCut adding subject masking capabilities and newer modalities such as LiDAR and wifi motion capture comes into play.

It might be useful to have Motion-BIDS maintainers' (@sjeung and @JuliusWelzel) opinions as well.

bendichter commented 7 months ago

How do we feel about this naming convention?

sub-<label>/[ses-<label>]
    <data_type>/
        <matches>_behcapture.mp3|.wav|.mp4|.mkv|.avi
        <matches>_behcapture.json

I'm not 100% on it myself but I can't think of anything better. Other options:

"_video.mp4|avi|mkv|..." and "_audio.mp3|wav|...".
"_behvideo.mp4|avi|mkv|..." and "_behaudio.mp3|wav|...".
"_behmedia.mp4|avi|mkv|mp3|wav|..."

Is there any precedence from other standards we could use here?

gdevenyi commented 7 months ago

Is there any precedence from other standards we could use here?

Technically mkv is container format, it could have different kinds of video/audio streams.

Should we specify non-patent-encumbered video compression formats?

dorahermes commented 3 months ago

@bendichter would be good to have your input on the proposed entities here.

A specific point of discussion is how open the description of the proposed -annot entity would be: for stimuli only or also for other types of annotations as discussed above?

bendichter commented 3 months ago

@dorahermes I like the idea of a general text annotations file that annotates a media file, and I think that could certainly be relevant downstream of these behavioral capture files.

I think the needs of stimuli storage and behavioral capture storage are different. With stimuli, you often have a single file that you play many times across different subjects, sessions, and trials, so it makes sense to have a root folder for these where they can be referenced repeatedly. For behavioral captures, every capture is unique, so it would make more sense to store these alongside other types of recordings. So I like what is going on with stimuli, but I don't want that that to engulf these ideas about how to represent behavioral capture.

I also am trying to keep this to an MVP, so I'd like to push off discussion of annotations, though I will say I think the general approach you link to will probably work for behcapture as well with minimal adjustments.

bendichter commented 3 months ago

Should we specify non-patent-encumbered video compression formats?

The most likely culprit here would be H264, which is used in mpeg files, however it seems that would be a non-issue since this would be covered under the "Free Internet Broadcasting" consideration (source)

yarikoptic commented 3 months ago

"_behvideo.mp4|avi|mkv|..." and "_behaudio.mp3|wav|...".

FWIW, I also think that we should have "audio", "video" in suffix (ref elsewhere) but do not think we should want to collapse an "intent" (beh) into it, moreover since we do have datatype beh and even modality so in principle could be depicted as _mod-beh but AFAIK so far we never did such way to associate (e.g. for _events.tsv).

yarikoptic commented 1 month ago

I think there is a good amount of overlap (datatypes, extensions) with "stimuli" BEP044. @bendichter when you get a chance, have a look at that BEP google doc.

bids-standard / bids-specification

BEP for audio/video capture of behaving subjects #1771