bids-standard / bep001

Project management repository (primarily issues) for BIDS Extension Proposal 001
Creative Commons Attribution 4.0 International
8 stars 11 forks source link

Should/could fa- (flip angle) contain a <value> not an <index>? #71

Closed yarikoptic closed 4 years ago

yarikoptic commented 4 years ago

Our use case: We are doing period QA (data is available from http://datasets.datalad.org/?dir=/dbic/QA), and in the past had some scans with standard flip angles with no indication of flip angle in the file name. Now we (in particular @kodiweera) experimenting with different flip angles. If we resort to _fa-<index>, we could potentially have confusingly identical filenames across sessions when actual flip angle would be different. Moreover it is not immediately clear from the filename how any two files (e.g. _fa-1 and _fa-2) are different, and would require lookups into .json files, decreasing the usefulness from BIDS convention.

I wondered what was the rationale (besides may be inability/difficulty of reflecting floating point values? did you see any of those in over 10% (100-90% of the use cases? ;)) for storing indexes and not the actual values?

KirstieJane commented 4 years ago

I think the rationale was that for most metadata the values are too far ranging to be included in the file name, so it makes sense to only have it in the json file.

BUT - I have a light memory that you can choose whatever index you want in the filename.... So you could - if you wanted to have _fa-25 and _fa-45 which I think is what you're looking for, right? You'd still need the FlipAngle metadata field in your json and that's where the value would actually come from.... you can name the file whatever you'd like.

I agree that the current specification isn't clear on this src/04-modality-specific-files/01-magnetic-resonance-imaging-data.md#indexable_metadata-index-key-value-pair and its also wildly possible that I'm misremembering a conversation.

Tagging @agahkarakuzu who might have more thoughts than me.....

tiborauer commented 4 years ago

_fa-25 and _fa-45 are valid only if you have 45 FA values in the list. Please, remember that the value in the filename is an index rather than the actual value.

So what you need to do is

ChristophePhillips commented 4 years ago

I agree with @tiborauer that the filename should only contain index(es) of some metadata like FA. Quick follow up question on this: in the accompanying JSON file must there a vector of values, i.e. one value per index, as you suggest? Or could I store a single value per JSON file, i.e. only the value for that index?

@yarikoptic if you have multiple sessions, couldn't you label the sessions/runs (_run-1_, _run-2_,...) to avoid confusion? And if you want to quickly check the actual FA values, maybe have that information saved in another JSON file, associated with each of the sessions/runs?

tiborauer commented 4 years ago

I see no reason why you could not. You need a list only, if you use a single JSON for multiple images.

I would prefer using the corresponding keys rather than just having runs. Although, run is not specified for structural data, yet; but for fMRI runs have the same acquisition parameters.

agahkarakuzu commented 4 years ago

As for index vs actual parameter value discussion, I think one of the core principles of the BIDS specification is not to store parameter values in file names. So I agree with @tiborauer and @ChristophePhillips Also note that _fa-<index> is an indexable_metadata.

@KirstieJane this is from one of my previous commits, I though that it would be redundant and omitted this from the specification. I think we should put it back, what do you think?

image

@ChristophePhillips @tiborauer I have added bunch of details regarding JSON files on this branch. These are not final of course, needs discussion before merging, but should be helpful. For the JSON files of the grouping suffixes, single value per JSON. For the JSON file of the resultant maps, using vectorized form makes more sense. You can see this simple VFA dataset on OSF repo, I followed this convention there.

Note that run entity assumes that all the parameters remain the same across multiple runs.

image

KirstieJane commented 4 years ago

I'm confused about the list thing after our conversation on the call yesterday - seems like we need to clarify that.... I had taken away from the conversation yesterday that we can't have lists. Are there any metadata fields in the main specification that are lists? Are we inventing something new here? Seems big....

If we didn't allow lists, would we then allow any value? I feel like this conversation has been had elsewhere in the specification. @yarikoptic - are you aware of other metadata fields that are stored in the filename?

I agree that fa is preferred, and run is for identical acquisitions.

tiborauer commented 4 years ago

VolumeTiming and SliceTiming, for example, are lists.

tiborauer commented 4 years ago

The confusion comes for the fieldmaps. I proposed, back then, using list to store TE for dual-echo fieldmaps; however, the proposal using EchoTime1 and EchoTime2 has been accepted for a reason I cannot recall now.

KirstieJane commented 4 years ago

VolumeTiming and SliceTiming, for example, are lists.

These wouldn't use the index from the filename though, right? I think this is a first use of using the index to grab information from a list.....and I guess having us just insert the FlipAngle as a single float in the json would fit better with the EchoTime1 example?

agahkarakuzu commented 4 years ago

@tiborauer

You need a list only, if you use a single JSON for multiple images.

I think that there is 1 <--> 1 mapping between a JSON file and an image, this is why we had inheritance principle discussion.

Given that there'll be a JSON for each image with a different fa, each JSON should better contain only the corresponding value instead of a list that contains parameters for every other image of the same group.

tiborauer commented 4 years ago

@KirstieJane, For MEG, they already have a very similar concept:

https://github.com/bids-standard/bids-specification/blob/1d73e7b630990a4368ed60ff6e971a7a393d3dfe/src/04-modality-specific-files/02-magnetoencephalography.md#L34-L39

"Some software reader may skip important metadata that is specific to MEG system manufacturers. It is therefore RECOMMENDED that users provide additional meta information extracted from the manufacturer raw data files in a sidecar JSON file. This allows for easy searching and indexing of key metadata elements without the need to parse files in proprietary data format. Other relevant files MAY be included alongside the MEG data; examples are provided below."

I agree it may be vague; however, I am not sure why it may not work.

However, there is a clear statement in the current spec against using actual values (for ME data):

"Please note that the <index> denotes the number/index (in a form of an integer) of the echo not the echo time value which needs to be stored in the field EchoTime of the separate JSON file."

KirstieJane commented 4 years ago

Coolio! Thanks @tiborauer - that makes sense to me to conform with the suggestion already in the spec to have the index be the index not the real value.

(I've remembered by the way the conversation that I couldn't properly bring to mind earlier. Your indices don't have to correspond to any order of the metadata. So index 1 can be flip angle 45, index 2 can be 65 and index 3 can be 25. No problem. 😸)

@yarikoptic - does all this make sense? Probably different json files for the two different scans with different flip angles and use the indexes rather than the values. Is there a different angle (lolz) that we haven't considered?

yarikoptic commented 4 years ago

Thank you everyone for this lively discussion. Sorry for me being a slow one ;)

TL;DR: Overall, I think that <index> should in general be avoided unless it is the most semantically descriptive way (e.g. in case of _run) or lookup into an ordered list is necessary (nowhere in BIDS atm AFAIK). With grouped scan collections it might be the case. I will try to digest it and followup later. Current bids-specification or master of this BEP do not warrant use of <index> for _fa really. If you could clarify the "inheritance principle" for me - I would appreciate.

And now the longer version: ;-)

I think I will need to spend some more time to grasp grouped scan collections which are introduced in the indexable_metadata branch of this BEP (and thus not a part of the BEP001 master). As far as I see it (indexable_metadata) would be the only good reason for using <index> here instead of a <label> which would allow for an arbitrary literal (and thus could be actual value) to be used to make dataset more human "readable".

Meanwhile, let me dump here some notes/arguments which could, once again, be irrelevant in the realm of indexable_metadata:

Inheritance principle

I think that there is 1 <--> 1 mapping between a JSON file and an image, this is why we had inheritance principle discussion.

IMHO (see https://github.com/bids-standard/bids-specification/issues/102) inheritance principle ATM is quite a weak point within BIDS. Although pybids and probably other tools have their own way to treat it, I am afraid it is not yet formalized well enough and thus I would not rely on it heavily just yet. If I am wrong, please follow up on that issue and clarify.

_run-<index>

@KirstieJane, For MEG, they already have a very similar concept:

https://github.com/bids-standard/bids-specification/blob/1d73e7b630990a4368ed60ff6e971a7a393d3dfe/src/04-modality-specific-files/02-magnetoencephalography.md#L34-L39

unfortunately there is no L34-L39 id in that page for me... The only <index> in that page I find, is the _run-<index>. For the run it makes perfect sense to distinguish between scans which are otherwise exactly the same (thus all other possibly informative fields being the same), and _run-<index> describing the repetition and (likely) the order of those.

If the reference was for the quote ("Some software reader may skip important metadata...") about usefulness of sidecar JSON files - I agree that there should be sidecar files which provide as much of information as possible. But either the filename has <index> or <label> is somewhat orthogonal to it (again disgregarding "grouping" effort/branch).

The only other one is the _echo-<index>

A grep on bids-specification ```shell $> git grep '[^n]-[_ses-

which seems was there since the markdown inception of bids-specification, and thus IMHO needs TLC due to all the "EchoTime[12]", and that is why great to see you are working on straightening it up

There is at least 2 datasets on openneuro which use it ```shell (git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro[master] $> for ds in ds*; do find $ds -iname *_echo* | head ;done ds000117/run-1_echo-1_FLASH.json ds000117/run-1_echo-2_FLASH.json ds000117/run-1_echo-3_FLASH.json ds000117/run-1_echo-4_FLASH.json ds000117/run-1_echo-5_FLASH.json ds000117/run-1_echo-6_FLASH.json ds000117/run-1_echo-7_FLASH.json ds000117/run-2_echo-1_FLASH.json ds000117/run-2_echo-2_FLASH.json ds000117/run-2_echo-3_FLASH.json ds000254/BOLD for task-induced functional MRI/sub-01/func/sub-01_task-bilateralfingertapping_echo-1_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-01/func/sub-01_task-bilateralfingertapping_echo-2_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-01/func/sub-01_task-bilateralfingertapping_echo-3_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-01/func/sub-01_task-bilateralfingertapping_echo-4_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-02/func/sub-02_task-bilateralfingertapping_echo-1_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-02/func/sub-02_task-bilateralfingertapping_echo-2_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-02/func/sub-02_task-bilateralfingertapping_echo-3_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-02/func/sub-02_task-bilateralfingertapping_echo-4_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-03/func/sub-03_task-bilateralfingertapping_echo-1_bold.nii.gz ds000254/BOLD for task-induced functional MRI/sub-03/func/sub-03_task-bilateralfingertapping_echo-2_bold.nii.gz ```

BIDS ATM is not entirely clear on either index for echo should anyhow imply any ordering or lookup in some list:

$> grep -A2 '[^-]<index' src/04-modality-specific-files/01-magnetic-resonance-imaging-data.md 
Please note that the `<index>` denotes the number/index (in a form of an
integer) of the echo not the echo time value which needs to be stored in the
field EchoTime of the separate JSON file.

So, what is the benefit from using _fa-<index> vs _fa-<label>?

If we disregard "grouped scans" for now, with <index>

so I see no benefit as of the master version of this BEP.

Advantages of using _fa-<label>:

agahkarakuzu commented 4 years ago

@yarikoptic going back to the very first comment you made

Moreover it is not immediately clear from the filename how any two files (e.g. _fa-1 and _fa-2) are different, and would require lookups into .json files, decreasing the usefulness from BIDS convention.

As you also noted in your latest comment, it makes sense within the context of grouped scan collections , because if a filename has a grouping suffix and _fa-<index>, it becomes immediately clear that the application which collects those files hinges on varying flip angles. Most importantly, when people go ahead and read the table for grouping suffixes, they can find out why <fa> was changed for that group of images. If someone is using a label from the specification that corresponds to an acquisition parameter, I think it is critical to have an explicit description for the cases where <fa> can be varied.

This is why we suggest the use of <indexable_metadata> with a grouping suffix.

For example:

sub-01_fa-1_VFA.nii.gz
sub-01_fa-2_VFA.nii.gz
sub-01_fa-3_VFA.nii.gz

The use of <index> comes in handy here because total number of varying flip angles may show differences for the same grouped scan collections between different datasets . For example, I can collect my dataset with 10 different flip angles to be more accurate with my VFA estimation, whereas 3 is enough in general. Having information on how many different flip angles are collected is easier with <index> for human readers and semantically relevant if you know what VFA is.

When people go ahead and read the table for grouping suffixes, it is clear why <fa> was changed for a group of images attaining VFA suffix. However, the purpose of why flip angle was changed becomes questionable if the suffix was T1w, because its function here rather remains esoteric and there is no explanation about it in the specification that people can refer to:

sub-01_fa-1_T1w.nii.gz
sub-01_fa-2_T1w.nii.gz

Therefore, it does not make much sense to use <fa> with conventional anatomical images. It goes more like "if you have two GRE images with different flip angles, you should better call one of them PDw and the other one T1w, next to each other". This makes more sense if those files will be processed individually, and knowing the name of their contrast is going to determine the course of the process (e.g. PDw contrast can be better for segmenting a specific structure).

The problem here is that you can draw the line anywhere you like for the amount of flip angle difference that separates PDws from T1ws. So what option do I have if I still want to call both those images T1w using different flip angles? Because I am just trying out some parameters. I would do the following:

sub-01_acq-fa30_T1w.nii.gz
sub-01_acq-fa35_T1w.nii.gz

This gives a better feeling that someone was just experimenting with different flip angles. Whereas the use of <fa> in the filename (with indexes) implies a well-defined use case, given that people can find explanations about it in the spec.

Lastly, the benefit brought by <index> does not change from master to indexable_metadata branch of the BEP001. IMHO, if we disregard grouped scans for BEP001, we'd be disregarding one of the main additions it brings.


The inheritance principle in its current form is a concern for BEP001, because we were planning to keep all the constant parameters in one (top) JSON file and put only varying parameters in lower JSON files such as:

sub-01_VFA.json (constant parameters shared between sub-01_fa-1_VFA.nii.gz  and sub-02_fa-1_VFA.nii.gz ) 
sub-01_fa-1_VFA.json (values of the varying parameters for sub-01_fa-1_VFA.nii.gz )
sub-01_fa-1_VFA.nii.gz 
sub-01_fa-2_VFA.json (values of the varying parameters for sub-01_fa-2_VFA.nii.gz )
sub-01_fa-2_VFA.nii.gz

According to the inheritance principle this is not valid, because the structure above makes one JSON file apply to more than one images. This is why we diverged from this approach, which was the only case we interfaced with the inheritance principle.

yarikoptic commented 4 years ago

This is why we suggest the use of <indexable_metadata> with a grouping suffix.

Was this suggestion discussed within the "BIDS core"?
IMHO it might even be worth a dedicated BEP since it is not merely adding new data types etc, but introduces changes into how sidecar files relate to data files etc. So it would be very worth having a more targeted discussion (of cause might have examples from this BEP as motivation)

yarikoptic commented 4 years ago

The use of <index> comes in handy here because total number of varying flip angles may show differences for the same grouped scan collections between different datasets . For example, I can collect my dataset with 10 different flip angles to be more accurate with my VFA estimation, whereas 3 is enough in general. Having information on how many different flip angles are collected is easier with <index> for human readers and semantically relevant if you know what VFA is.

When people go ahead and read the table for grouping suffixes, it is clear why <fa> was changed for a group of images attaining VFA suffix.

Hm... I do not think anything in BIDS is intended to answer why (why do we collect any anat/ -- for registration? for diagnosis? for deriving surfaces? for both?), but rather what.

That is why I actually quite dislike having _T1w being replaced with some non-informative _VFA, but I see the point that with different fa values you could obtain files with different suffixes... I would have suggested to come up with a dedicated _grp-<label> field for such grouppings, while retaining suffix to describe what image it is (_grp-vfa1_T1w, _grp-vfa1_PD) thus staying more consistent with BIDS "core". Then folks who do not care about having multi flip angle images, could just happily use _T1w in their pipelines, because they would know what it is. Those who do care about VFA - could just group those images, get FlipAngle from sidecar .json, and we all are happy family again making such datasets useful not only for VFA interested folks.

once again -- sounds like a good discussion to bring up with other BIDS fellas community members.

KirstieJane commented 4 years ago

I think what we're trying to do in BEP001 is have a smaller group work through the (many) complexities of extending the specification before opening up the discussion. Although - to be clear - everyone is welcome to join in! Conversations like this make it much easier to see where the challenges are and how we should be capturing the many discussions we've had about this over the years.

We'll absolutely take your considerations into our extension of the specification - particuarly capturing more of the motivation for why we've chosen to go down this path which is - at the moment - not clearly enough explained.

One last point - I'm not a fella. Please don't generalise the BIDS community as all male.

yarikoptic commented 4 years ago

One last point - I'm not a fella. Please don't generalise the BIDS community as all male.

No (over)generalization was intended, sorry. I used fella as a colloquial form of fellow with the connection to fellowship as we are all aiming for the same common good. But English is not my native language, I guess I have made a mistake.

And even though dictionary meaning of "fellow" have many gender neutral meanings for the one I intended to convey ```shell $> dict fellow 5 definitions found From The Collaborative International Dictionary of English v.0.48 [gcide]: Fellow \Fel"low\, n. [OE. felawe, felaghe, Icel. f[=e]lagi, fr. f[=e]lag companionship, prop., a laying together of property; f[=e] property + lag a laying, pl. l["o]g law, akin to liggja to lie. See {Fee}, and {Law}, {Lie} to be low.] 1. A companion; a comrade; an associate; a partner; a sharer. [1913 Webster] The fellows of his crime. --Milton. [1913 Webster] We are fellows still, Serving alike in sorrow. --Shak. [1913 Webster] That enormous engine was flanked by two fellows almost of equal magnitude. --Gibbon. [1913 Webster] Note: Commonly used of men, but sometimes of women. --Judges xi. 37. [1913 Webster] 2. A man without good breeding or worth; an ignoble or mean man. [1913 Webster] Worth makes the man, and want of it, the fellow. --Pope. [1913 Webster] 3. An equal in power, rank, character, etc. [1913 Webster] It is impossible that ever Rome Should breed thy fellow. --Shak. [1913 Webster] 4. One of a pair, or of two things used together or suited to each other; a mate; the male. [1913 Webster] When they be but heifers of one year, . . . they are let go to the fellow and breed. --Holland. [1913 Webster] This was my glove; here is the fellow of it. --Shak. [1913 Webster] 5. A person; an individual. [1913 Webster] She seemed to be a good sort of fellow. --Dickens. [1913 Webster] 6. In the English universities, a scholar who is appointed to a foundation called a fellowship, which gives a title to certain perquisites and privileges. [1913 Webster] 7. In an American college or university, a member of the corporation which manages its business interests; also, a graduate appointed to a fellowship, who receives the income of the foundation. [1913 Webster] 8. A member of a literary or scientific society; as, a Fellow of the Royal Society. [1913 Webster] Note: Fellow is often used in compound words, or adjectively, signifying associate, companion, or sometimes equal. Usually, such compounds or phrases are self-explanatory; as, fellow-citizen, or fellow citizen; fellow-student, or fellow student; fellow-workman, or fellow workman; fellow-mortal, or fellow mortal; fellow-sufferer; bedfellow; playfellow; workfellow. [1913 Webster] Were the great duke himself here, and would lift up My head to fellow pomp amongst his nobles. --Ford. [1913 Webster] ```

I will change it, and not to "comrade" (which in my view/history is gender neutral) which would be the closest, but to a longer "community members". I hope that suffice. Sorry for my bad English.

tiborauer commented 4 years ago

I would argue against BIDS ignoring why. Modality-specific suffixes answer why rather than what, and we already have some discussions on the inaccuracies of the what.

There are two main motivations behind using grouping suffixes.

  1. Weightings may often mixed in certain combination of acquisition parameters, so there may be cases when it would be very non-intuitive and arbitrary to call an image _T1w or _PDw. For that reason, we consider grouping suffix more appropriate and accurate.
  2. Grouping suffix helps tools to identify images belonging to an/a compound acquisition. For that reason, we consider grouping suffix more practical.
yarikoptic commented 4 years ago

Although related to the original issue, I think it is worth a separate discussion, thus moving into a new https://github.com/bids-standard/bep001/issues/72