edf file naming - Githubissues

julia-pfarr commented 5 months ago

EyeLink only allows 8 characters when recording the data. In our converter we only add the suffix but we do not code subject number or task name. Do we want users to have their files already named "sub-01_taskname" before conversion or do we want to add something that allows the user to insert it during the conversion process?

oesteban commented 5 months ago

I'm not sure I understand the problem, so let me think out loud:

You want eye2bids to write BIDS-compliant files at the output, meaning they should be located in the right subject/session/datatype directory tree.
You could hope this information was encoded in the EDF file (do those have headers? Where would that be encoded?)—however, 8 characters is insufficient to write a name.
You propose enforcing some naming on the original EDF files so the target BIDS path can univocally be interpolated.

To (1) I'd say don't worry, make sure EDF to tsv.gz files + JSONs works well and leave the naming for the very end. To (2) I'd say that even if EDF allowed more flexibility, it's a really twisty road where you trust the user set ALL the metadata before sessions and did consistently. To (3) I can only suggest that maybe edf2bids requires something more than just an EDF file and/or an existing BIDS structure:

If the BIDS structure exists AND the scans.tsv is there AND it has not been jittered for deidentification purposes (please note that's three IFs) then you can try to match the time of acquisition metadata in the EDF (IF it is correct), and find what are the corresponding scans.
Since the above is too much, perhaps you can request the user to have a table mapping EDF files and particular BIDS selectors (subject, task, run, suffix, etc.). We did this (a very custom version of it) here: https://github.com/TheAxonLab/hcph-sops/blob/mkdocs/code/eyetracking/schedule.tsv. This is only one subject, hence we only needed session ID, day, and encoding direction to reconstruct the four target MRI runs.

oesteban commented 5 months ago

Maybe @yarikoptic can give us a much better response given his experience with reproin and heudiconv (although, reproin uses your option 3)

Also, Yarik may be able to tell us if heudiconv has "hooks" where eye2bids could be called. That must have an interface and maybe the problem turns into just adopting it.

julia-pfarr commented 5 months ago

You want eye2bids to write BIDS-compliant files at the output, meaning they should be located in the right subject/session/datatype directory tree.

The folder thing we somewhat "control" through the --output_dir flag

You could hope this information was encoded in the EDF file (do those have headers? Where would that be encoded?)—however, 8 characters is insufficient to write a name.

Subject number is not encoded in the edf file, taskname can be recorded if the user defines it beforehand but not everyone does that.

You propose enforcing some naming on the original EDF files so the target BIDS path can univocally be interpolated.

That's my actual question. Because they can give the edf file a name with only 8 characters it would not be possible for them to let the eyelink give them a file named "sub-01_taskname.edf" because subject encoding already takes 6 characters. Of course, they can rename the file straight after acquisition but that we would need them to do then.

So, as I see it, we can 1) either make it a precondition by adding information like "to receive the proper filenaming after conversion, please make sure your edf file contains subject number and taskname in the manner "sub-01_taskname.edf"" or 2) incorporate it in the converter, giving a warning before filename generation saying "please insert subject number and taskname" or so or 3) not care at all and people will get an output with "_recording-eye1_physio.tsv.gz" plus whatever they put as edf-filename in front of the suffix. At latest they will notice when trying to validate the dataset.

Remi-Gau commented 5 months ago

Since the above is too much, perhaps you can request the user to have a table mapping EDF files and particular BIDS selectors (subject, task, run, suffix, etc.). We did this (a very custom version of it) here: https://github.com/TheAxonLab/hcph-sops/blob/mkdocs/code/eyetracking/schedule.tsv. This is only one subject, hence we only needed session ID, day, and encoding direction to reconstruct the four target MRI runs.

quite FYI for context

ideally I would prefer to leave a lot of this mapping be handled by "real" BIDS converters that already do that well (heudiconv, dicom2bids, bidscoin...) and to have eye2bids be just a "plugin" for those that would handle the data wrangling for them.

sorry for the brevity, ask me questions if this is too confusing

oesteban commented 5 months ago

have eye2bids be just a "plugin" for those that would handle the data wrangling for them.

agreed, this is why I summoned @yarikoptic here :)

julia-pfarr commented 5 months ago

ok, I don't know the other converters well enough. I'll wait for Yariks reply

Just wanted to make sure that this is also considered for people who don't combine eyetracking with other modalities but have only eyetrack & behavior data.

Remi-Gau commented 5 months ago

FYI had also talked briefly with Marcel Zwiers who maintains bidscoin that has a plugin system

yarikoptic commented 5 months ago

Unfortunately heudiconv is quite inflexible as it is after all "Heuristic DICOM converter", so would not be easy to incorporate another data type ATM. But indeed I have some experience with approach 3) since it is what we rely on in reproin where the hope is that researchers invest some small time at the beginning once to name sequences so that conversion is later unambiguous.

Subject number is not encoded in the edf file

We do rely (although can be overloaded from command line or a helper script) on subject id to come from as entered on data acquisition system and given to us in DICOM files.

Because they can give the edf file a name with only 8 characters it would not be possible for them to let the eyelink give them a file named "sub-01_taskname.edf"

I am not familiar with EyeLink system -- is there any way for preparing "experiment protocols" like it is done on MRI systems , see e.g. this image ? or entering extra metadata?

Is there time metadata in the file?

If yes -- what I am thinking -- if this eye tracking data is to be associated with other imaging data, then it is closer to our https://github.com/ReproNim/reprostim/ project where given a video file we need to cut out a piece (yet to do) and associate with a specific data file in an existing BIDS dataset.

Then common use case would be more of just providing .edf file and BIDS dataset, and tool figuring out appropriate file (based on timing in _scans.tsv) to associate that file with, and thus "borrowing" subject/session/task information from that target data file which matches in terms of time overlap.

Is there metadata in .edf about particulars of data acquisition like which eye it is etc?

julia-pfarr commented 5 months ago

I am not familiar with EyeLink system -- is there any way for preparing "experiment protocols" like it is done on MRI systems , see e.g. this image ? or entering extra metadata?

For collecting the data on the Host-PC (=EyeLink PC), you are limited to 8 characters and you can not change this (apparently, I'm not doing eyetracking myself...however, looking at the code eyelink gives I doubt it a bit, but I'm no expert).

However, while you transfer the data to your main computer (=computer where experiment is displayed to participant and data is eventually saved), you can rename the file during this process. This is automatically done IF you coded it before. It is apparently easy to make use of this as this function comes with the EyeLink code templates, so the User really only has to type in the filename in this function. So what they could do is just give the subject number as input for the pop-up window before the experiment (host-pc) and hard-code the taskname in their code to be added to the subject number while transferring it to main-pc and folder.

My question was more about how much we want to control by the converter regarding filename and how much we want to "serve" the users/how much to leave to the users to care about it themselves. Because I'm not sure how common a practice this file-renaming while transferring is.

Is there metadata in .edf about particulars of data acquisition like which eye it is etc?

In the edf file we have metadata about which eye and a timestamp like this: Mon Sep 9 16:48:07 2013. So this could be helpful for this issue when combined with other modalities. My concerns were more regarding the researchers doing purely eyetracking without any other modalities.

mszinte commented 4 months ago

Hi, Sorry for not taking part earlier to this discussion, I was away for different reasons (personal and work). I'm in favor of the above expressed opinion "eye2bids be just a "plugin" for those that would handle the data wrangling for them." To my point of view, eye2bids should take the data where they are in the way they are saved and create with same names the different .json and tsv.gz files.

mszinte commented 4 months ago

I am not familiar with EyeLink system -- is there any way for preparing "experiment protocols" like it is done on MRI systems , see e.g. this image ? or entering extra metadata?

Not that I'm aware of, however Eyelink provide a rich file with different settings and metadata, eye2bids aim at collecting these info and putting them in the json following the standards we agreed on with the BEP.

Is there time metadata in the file?

Well, if you refer to when the data were collected, yes. It give something like that: "** DATE: Mon Sep 9 17:37:10 2013"

If yes -- what I am thinking -- if this eye tracking data is to be associated with other imaging data, then it is closer to our https://github.com/ReproNim/reprostim/ project where given a video file we need to cut out a piece (yet to do) and associate with a specific data file in an existing BIDS dataset.

Then common use case would be more of just providing .edf file and BIDS dataset, and tool figuring out appropriate file (based on timing in _scans.tsv) to associate that file with, and thus "borrowing" subject/session/task information from that target data file which matches in terms of time overlap.

It is a good proposal, I however feel it is a different endeavor for eye2bids which TMO aim first at converting "raw" data while leaving other softwares or user deal with file naming.

Is there metadata in .edf about particulars of data acquisition like which eye it is etc?

Yes and we are parsing them now in eye2bids together with other things.

bids-standard / eye2bids

edf file naming #72