BEP Proposal: Atlas specification

jdkent commented 2 years ago

Your idea

EDIT

github draft for current discussion: https://github.com/PESTILLILAB/bids-specification/blob/bep038/src/atlas.md

the google doc draft (deprecated in favor of the github draft) : https://docs.google.com/document/d/1RxW4cARr3-EiBEcXjLpSIVidvnUSHE7yJCUY91i5TfM/edit?usp=sharing

PeerHerholz commented 10 months ago

Hi @melanieganz,

sorry, yes the meeting on the 24th is on the books for discussions. I just wanted to use as much time beforehand to actually port to markdown, so we can quickly move forward. It's not often that I get two days in a row to work on something, so I wanted to make a push. But otherwise, maybe we can quickly Zoom at 9:30 am tomorrow morning?

No worries, that's great and I'm happy to meet then!

And regarding placement, why don't I make an atlas markdown file on the same level as the common principles and then we can decide later where to move it. This way we have a markdown file.

Sounds like a plan!

melanieganz commented 10 months ago

Great, sending you a Zoom link directly!

melanieganz commented 10 months ago

And for everyone else, the first draft of the atlas spec is here

melanieganz commented 10 months ago

The first draft of the atlas.md is finished! Note, there's still some smaller stuff such as links and reference to fix which I will continue with now and tomorrow. Especially the tables are giving @CPernet and me issues, @PeerHerholz or @bendhouseart can you please help with that?

PeerHerholz commented 10 months ago

Sure thing!

I'm currently trying to finish a draft for the examples and will get to it asap after that.

PeerHerholz commented 10 months ago

Hi @melanieganz & @CPernet,

I just pushed the first draft for the examples here. I focused on "atlas-used-is-as" use cases to outline/showcase how things would look like. Your examples would then go under a to-be-created "derivatives" directory within "bids_atlas_examples".

I also added two new keys to the JSON files: Dimensions and 4thDimensions to indicate what the 4th dimension entails in case of 4D probabilistic atlases.

WDYT?

melanieganz commented 10 months ago

Cool, will check it out! Maybe you want to even look at the atlas.md where a bunch f examples are listed at the end.

effigies commented 10 months ago

@PeerHerholz Looking at the examples, I think we want both atlas datasets and atlases-within-datasets to ensure that the validator covers both cases. I would take the following approach:

atlas_dataset/
  dataset_description.json  # contains "DatasetType": "atlas"
  atlas-AAL/
    ...
  atlas-HarvardOxford/
    ...
atlas_within_dataset/
  dataset_description.json  # contains "DatasetType": "raw" or "derivative"
  atlases/
    atlas-Shaefer2018/
      ...
  sub-01/
    ...

I think this was the approach I understood the BEP to be taking, so apologies if I've missed changes or I'm misremembering previous discussions.

PeerHerholz commented 10 months ago

Hi @effigies,

thanks for the feedback.

Sorry, I should've been more clear in my commit message: this was only the first part of the example and the rest will follow today/later this week. I wanted to start with atlases-within-datasets, providing examples for "atlas-used-as-is" at root and "derived-altered-atlas" under derivatives as I'm working with @melanieganz, @CPernet and @mnoergaard during their brainhack.

PeerHerholz commented 10 months ago

Hi @melanieganz, @CPernet and @mnoergaard,

I just pushed an updated example, including your PET atlas and while working on it, a couple of discussion points came up on my end, which I would like to bring up during the meeting (I'll also add them here later on).

yarikoptic commented 10 months ago

@PeerHerholz Looking at the examples, I think we want both atlas datasets and atlases-within-datasets to ensure that the validator covers both cases

FWIW :+1: on that and consistency in such a way (few days back I made analogous comment on google doc). Relates somewhat to https://github.com/bids-standard/bids-2-devel/issues/59 and even YODA principle, in that potentially atlases/ could be an entire BIDS subdataset on its own.

melanieganz commented 10 months ago

The atlas.md draft is ready. @CPernet, @mnoergaard and myself incorporated feedback by @PeerHerholz and @Remi-Gau. Please note I only added an atlas definition as an entity in the schema and hence tables are just Markdown tables for now.

yarikoptic commented 10 months ago

could there be a PR (even if internal) -- easier to provide review on the diff even if it is a full new file.

PeerHerholz commented 10 months ago

Thx @melanieganz, @CPernet and @mnoergaard!

@yarikoptic: yes, we aim to open a PR either tomorrow or Friday as we are currently developing/testing things within the BEP team.

That being said, I just pushed an updated version of the "atlas-as-dataset" example here, including 4 different atlases to showcase how different atlases would be represented. I will now update the "atlas-within-dataset" examples.

yarikoptic commented 10 months ago

That being said, I just pushed an updated version of the "atlas-as-dataset" example here, including 4 different atlases to showcase how different atlases would be represented. I will now update the "atlas-within-dataset" examples.

I am confused why there is atlas/ folder there -- not even atlases/ or just without any folder per https://github.com/bids-standard/bids-specification/issues/1281#issuecomment-1906485806

PeerHerholz commented 10 months ago

Thx @yarikoptic. I missed this difference and updated the examples accordingly.

PeerHerholz commented 10 months ago

@melanieganz, @CPernet and @mnoergaard, I pushed example for the other use cases: atlas transformed to template space, atlas transformed to single subject space and atlas derived from a given single subject. Would you mind having a look at those?

melanieganz commented 10 months ago

So in the atlas as dataset examples you don't have the leading atlas folder under which they should all be stored, right? You have that in the other examples if an atlas is present.

CPernet commented 10 months ago

correct, I'll fix case 3

PeerHerholz commented 10 months ago

So in the atlas as dataset examples you don't have the leading atlas folder under which they should all be stored, right? You have that in the other examples if an atlas is present.

Yes, I changed it given @effigies and @yarikoptic's feedback, ie in the atlas-as-dataset use case, there's no extra "atlas" directory at root within which different atlases are provided in dedicated directories. Thus, it behaves more like the sub structure which was the actual intention.

PeerHerholz commented 10 months ago

Hi @melanieganz, @CPernet and @mnoergaard,

I briefly went through the atlas.md again and updated some wording/links and fixed some typos. While doing so, a couple of questions popped up on my end:

we might have to discuss the desc- usage again as the explanation and examples don't align
- it might be better if we change it to not discourage desc- usage but outline more precise usage
- for example, the Schaefer atlas has different versions at two scales: overall release and subtypes with the first referring to e.g. the original version from 2018 or updated ones from 2019 and later and the second referring to number of parcels (e.g. 400, 800, etc.) and Yeo network assignment (7 or 17)
  - here, it might be better to use the atlas- key to denote the overall release, e.g. Schaefer2018 and desc- to indicate the specific subtype, e.g. 400Parcels7Networks
within datasets: use case 1 has _edges file that is not being discussed/introduced
within datasets: description of use case 2
- atlases can be transformed to subject space and not being directly applied to data, no?
- maybe this should be mentioned accordingly, ie if it still should be changed to seg

CPernet commented 10 months ago

IMO desc- is to be used when there are no other options, so here it makes sense
edges, I though we removed then all, sorry about that

melanieganz commented 10 months ago

Yes, sorry about the edges, we tried to remove everything that points to BEP017, but I didn't explicitly search for the word edges. Can you remove it or should I do it? And I agree with @CPernet wrt the desc- and that also makes sense with your example of the Schaefer atlas. I guess the point is that we don't want people to abuse desc-.

PeerHerholz commented 10 months ago

Thanks for your replies and no worries at all! I can update things respectively, no problemo.

PeerHerholz commented 10 months ago

I updated things and added a draft to outline desc- usage (I need to fix the formatting, sorry.). Would you mind having a look?

melanieganz commented 10 months ago

Thanks @PeerHerholz, I think it reads ok, I might want to go in and simplify the sentence structure a bit. But it seems like it comes to early in the atlas.md file? It comes even before the directory structure. So I would suggest to move it after the "cases".

yarikoptic commented 10 months ago

That being said, I just pushed an updated version of the "atlas-as-dataset" example here, including 4 different atlases to showcase how different atlases would be represented. I will now update the "atlas-within-dataset" examples.uses

I am confused why there is atlas/ folder there -- not even atlases/ or just without any folder per #1281 (comment)

https://github.com/PESTILLILAB/bids-specification/blob/bep038/src/atlas.md uses atlas/ not atlases/ which IMHO wrong.

PeerHerholz commented 10 months ago

Hi @yarikoptic,

@melanieganz, @CPernet and decided on atlas but have nothing against atlases. Would you mind outlining why you prefer atlases?

CPernet commented 10 months ago

let's also see if an atlas shared at root works for @oesteban . It would be nice (while not necessary) if compatible with template flow

atlas_dataset/
  dataset_description.json  # contains "DatasetType": "atlas"
  atlas-AAL/
    ...
  atlas-HarvardOxford/
    ...

yarikoptic commented 10 months ago

@melanieganz, @CPernet and decided on atlas but have nothing against atlases. Would you mind outlining why you prefer atlases?

multiple atlases are possibly stored under that folder
we have already similar derivatives/, stimuli/ (plural of stimulus) folder as a collection for (possibly multiple) derivatives and stimuli
- sourcedata/ and rawdata/ could also be considered plural as word data is
- I think we did miss and might like to generalize to plural also phenotype/ -> phenotypes/ since per participant IIRC
- code/ is IMHO ok to be singular since it is not intended to cover "per entity" code etc.
we have participants.tsv, samples.tsv, etc to provide metadata on multiple instances of any of those entity

overall, in the light of thinking about https://github.com/bids-standard/bids-2-devel/issues/54 I now outlined there an attempt at generalizing the organizational structure of BIDS and plural version of an entity name is one of the principles.

oesteban commented 10 months ago

let's also see if an atlas shared at root works for @oesteban . It would be nice (while not necessary) if compatible with template flow
atlas_dataset/
  dataset_description.json  # contains "DatasetType": "atlas"
  atlas-AAL/
    ...
  atlas-HarvardOxford/
    ...

My 2cts, and apologies in advance for not being very enthusiastic about the whole atlas(es) effort. I am likely biased by templateflow, but at the same time, that is a practical experience at the basis of my reasoning.

I have a fundamental problem with the foundations of the BEP. IMHO, atlases are knowledge annotations living on (meaning, referred to) a specific stereotaxic space (classic definition) and more generally, a manifold (i.e., not limited to a regularly gridded volume -- it could be a surface or a graph).

You can annotate a single feature map of a single subject, and BIDS-Derivatives covers that. IMHO, BIDS-Derivatives has sufficient specs to write atlases already.

What BIDS has traditionally missed the most is a specification for templates. This is important because atlases can be interpreted if and only if they are defined within a spatial reference. Templates (volume, surface, etc.) engender the space that allows defining atlases.

This is why in templateflow we define templates at the level of BIDS' subjects, and then you can have the same Schaefer atlas in different spaces (i.e., referenced to different templates), whether it is defined w.r.t. this version of MNI or this surface of a popular surface reconstruction package. If you calculate a mapping between a template and an individual, you can move all atlases defined w.r.t. to that template into the individual, if you so will.

If you have the atlas but do not have the template, you don't have anything workable.

yarikoptic commented 10 months ago

;-) May be to me as to a hammer everything looks like a nail, but in the light of https://github.com/bids-standard/bids-2-devel/issues/54 it looks like it could provide remedy there too since it would be for the creator/specific dataset to decide on what is the primary dimension to group by - by "atlas" or by template/space that atlas is defined in:

If the dataset is a collection of atlases (hence this BEP) defined in some single (or few) template/space -- makes total sense to make leading entity - atlas, and thus first level separation atlas-<label>/.
If the dataset is a collection templates (@oesteban case) and spaces with some atlases aligned to them -- makes total sense to make leading entity to be template or space.

oesteban commented 10 months ago

@yarikoptic If this BEP adopts the flexibility of bids-standard/bids-2-devel#54, it will become incompatible with the current specs (which I don't think is desired), or else it will need to fully specify the two use cases you indicate (and risk the BEP is seen as overly complicated by some).

In addition to that, I think BIDS is more about finding consensus and catering to at least the 80% rather than trying to accommodate all the possibilities. NIDM would be a much better tool for that.

TemplateFlow makes the following linking:

subject -> template segmentation/parcellation -> atlas

with this projection, much of what is already in BIDS/BIDS-Deriv has worked well for TemplateFlow.

yarikoptic commented 9 months ago

@yarikoptic If this BEP adopts the flexibility of bids-standard/bids-2-devel#54, it will become incompatible with the current specs (which I don't think is desired), or else it will need to fully specify the two use cases you indicate (and risk the BEP is seen as overly complicated by some).

yet to workout details, but I think it just would be generalization of "DatasetType": ie we could have pre-defined mapping for some set of DatasetTypes (e.g. here atlas, and may be some template for templateflow ones which now do not even use dataset_description.json) or just allowing some "custom" specification of hierarchy in some DatasetSpec or alike.

melanieganz commented 9 months ago

Dear @oesteban and @yarikoptic,

thanks a lot for the feedbakc and I think it's an important point to address that has come up regarding the necessity of the BEP.

I know @oesteban wasn't present when a lot of the atlas BEP was discussed last summer in Copenhagen. Hence, I want to make sure to bring the example ideas up, why I believe we need an atlas BEP.

I think one misunderstanding in the (often only MRI-based way) of thinking is that it ignores the case when we distill new information about the brain and create new atlases. Those new atlases can be derived in multiple ways, e.g. in PET by averaging single subject derivatives such as BPnd maps to create an average map that via a calibration with autoradiography can be turned into an actual quantitative map showing receptor density. These are the type of atlases that not only show regions in space, but also quantitative information about the brain - just like the big brain atlas from Juelich that's single subject based.

The atlas BEP allows us to properly share those atlases, independent of us being allowed to share the subjects they are derived from, since e.g. in the EU right now we cannot share the individual subjects and their derivatives as of yet. The BEP also gives enough context on how the atlases were derived. The BEP also covers what happens when you use those atlases in different case settings.

melanieganz commented 9 months ago

@melanieganz, @CPernet and decided on atlas but have nothing against atlases. Would you mind outlining why you prefer atlases?
* multiple atlases are possibly stored under that folder

* we have already similar `derivatives/`, `stimuli/` (plural of `stimulus`) folder as a collection for (possibly multiple) derivatives and stimuli

  * `sourcedata/` and `rawdata/` could also be considered plural as word `data` is
  * I think we did miss and might like to generalize to plural also `phenotype/  -> phenotypes/` since per participant IIRC
  * `code/` is IMHO ok to be singular since it is not intended to cover "per entity" code etc.

* we have `participants.tsv`, `samples.tsv`, etc to provide metadata on multiple instances of any of those entity
overall, in the light of thinking about bids-standard/bids-2-devel#54 I now outlined there an attempt at generalizing the organizational structure of BIDS and plural version of an entity name is one of the principles.

The plural of atlas as root directory is totally fine, this was just an inconsistency in the original Google doc and I just decided for singular when typing up the atlas.md last week, but we can also make it plural. That's fine.

oesteban commented 9 months ago

I think one misunderstanding in the (often only MRI-based way) of thinking is that it ignores the case when we distill new information about the brain and create new atlases. Those new atlases can be derived in multiple ways, e.g. in PET by averaging single subject derivatives such as BPnd maps to create an average map that via a calibration with autoradiography can be turned into an actual quantitative map showing receptor density. These are the type of atlases that not only show regions in space, but also quantitative information about the brain - just like the big brain atlas from Juelich that's single subject based.

Sure, maybe I use an invalid definition of the term "atlas". But I honestly don't think it is the case, as we gave this a lot of thought when preparing our TemplateFlow paper (https://doi.org/10.1038/s41592-022-01681-2). In particular, the introduction tries to elaborate on this with precision, so I would strongly recommend reading it (it's a brief communication, so it's really short read).

In particular, and in the light of what is said in our paper:

Those new atlases can be derived in multiple ways, e.g. in PET by averaging single subject derivatives such as BPnd maps to create an average map that via a calibration with autoradiography can be turned into an actual quantitative map showing receptor density.

To me, this is not an atlas, but rather a template by definition. That averaged feature (PET uptake, calibrated and made quantitative) engenders a coordinate system for that particular subject. Whatever brain knowledge you annotate w.r.t. that stereotaxic space (landmarks, regions, surface areas, etc.) becomes (to me) an actual atlas of the population represented by the template (in this case one single subject). This works the same for a cohort of subjects (i.e., similar age, neurodevelopmental trajectory, etc.) or for species (you create one template for humans, one for pigs and then derive different atlases on each of the two template spaces).

Because how you generate knowledge about the brain (the atlas) doesn't really matter, templateflow allows you to represent probabilistic, deterministic, stratified-by-developmental-time, etc. atlases. They can be defined in the volume or the surface (which is defined by the reference template).

So, I would argue this is not an MRI-centric definition; it's pretty general. Historically, anatomical MRI has allowed us to generate very high-resolution, digital templates that neuroscientists can interpret with ease and have allowed group analyses.

From a neuroscience perspective, I totally agree that atlas is way more relevant a concept than template. However, from a signal processing viewpoint (and particularly how to store this knowledge), the template concept is way more relevant as it will tell you the coordinate system in which things happen.

And this generalizes automatically to, e.g., EEG. I'm totally ignorant of whether there is value in averaging some property per electrode or something, but that would give you the template in which you can atlas EEG signals. But you need to align the data before you get to the template, and only once that space (manifold) is defined (e.g., 128 channels + time), you can start the atlasing.

Current BIDS specs partially address some of these issues here https://bids-specification.readthedocs.io/en/stable/appendices/coordinate-systems.html#coordinate-systems -- but this is quite insufficient IMHO. Unfortunately, we couldn't get a much better outcome as we did not arrive at a solid consensus, and a decision was made to leave it that way for the time being (which I think was a good call; there wasn't any perspective to get anything better at the moment).

TL;DR - In my opinion, the foundational principles and definitions regarding atlases in this BEP are unclear and do not provide a good base to build the BEP from my very engineering viewpoint. But I could be wrong, and I haven't honestly articulated this with so much clarity before (though I tried verbally on several occasions when we had the workshop at Austin).

poldrack commented 9 months ago

Hey Oscar, thanks for your input. I for one am struggling to understand what your goal here is. If it's to scuttle the entire BEP, that's simply not going to happen, as it solves important problems for the connectivity community. Completely rewriting from a different standpoint is also not going to happen. Are the specific changes to the current BEP that you can suggest that might help address some of your particular concerns?

oesteban commented 9 months ago

Hi Russ, I understand your reserves.

If it's to scuttle the entire BEP

It's not. I do think there is a lot of work on this BEP that is already very useful. All the description about metadata fields and values, with the rationale behind all of those fields being the most relevant to me.

While I think BIDS already supports everything needed (in terms of file formats and flexibility to define metadata), I believe the effort everyone involved has put into identifying and describing all those metadata is very remarkable and I didn't mean to diminish its value. Adding those descriptions into BIDS is very valuable.

as it solves important problems for the connectivity community

I'm less optimistic about this reasoning for the reasons above. If the BEP is based on a faulty foundation, it will create more problems than it resolves. It will help a lot when sharing derivatives because all the good work on describing metadata will have the effect that everyone will have the same metadata fields. But that impact is limited compared to the purpose of the BEP.

Completely rewriting from a different standpoint is also not going to happen. Are the specific changes to the current BEP that you can suggest that might help address some of your particular concerns?

I will re-emphasize that all the text about metadata doesn't require a rewriting. However, I do think the atlas-<name> structure does not work well with the formal structure of the data behind it. The specific change is (and the proposal was already verbally formulated when I first proposed a templates' BEP in ~2018) to adopt TemplateFlow's structure and work from there, rather than coming up from somewhere else.

This is also a "more BIDS-y" approach to building the BEP, as it would follow the example-driven development. The same way the early versions of BIDS raw followed the existing data structures developed for/from OpenfMRI, why TemplateFlow existing as a BIDS-like data structure we would not start from if?

I for one am struggling to understand what your goal here is.

I was afraid that, if I went ahead and suggested changes to the BEP considering my views then all those comments would have been directly rejected with little consideration.

Now, after writing my viewpoints, I hope to have achieved two goals:

(i) make it more present that this is my view as a result of having felt my opinion was forgotten in previous instances --this is not a complaint, perhaps I did a poor exposition of my arguments, perhaps I chose the wrong moment or the wrong individuals at times, that is not relevant--.

(ii) raise awareness and caution over a problem that will bite sooner or later, emerging from the inconsistency of the foundations of this BEP with the current specs about standard spaces and coordinates, and with future BEPs such as the transformations, where templates will be required as "first class" citizens of BIDS (a bit like subjects) to identify the origin and destination of the mathematical operations involved.

The goal was definitely not to dismiss anyone's time put into this or simply oppose their views. On the contrary, that time and knowledge would be a waste if the BEP goes in without reflecting upon my objection, and in the end, it becomes impractical for users and abandoned. I really hope it wasn't perceived that way. There is a lot in the BEP that is very worth adding to BIDS. I also understand the decision-making of BIDS works, so if my arguments are not convincing, obviously, they will not achieve any changes to the trajectory of the BEP.

effigies commented 9 months ago

I'm not sure if we ever laid out definitions of terms in the results of the Copenhagen meeting (someone please link them, if we did). "Atlas", "space" and "template" are pretty overloaded and often used interchangeably, so we tried to tease out these differences in a way that would mostly match people's usages. Here is my reconstruction of these terms:

Space - A common coordinate frame with an origin, axis orientations and spatial units; in practice this is too abstract to be directly usable.
Template - a canonical anatomical reference in a particular space; while two templates may be in the same space, the template matters for the association of coordinates with structures, so when something is said to be resampled into a space, the template used is the more useful name.
Atlas - A reference quantity sampled to a template. In many contexts this is a parcellation, but it in some communities (notably PET) it is more common to publish absolute quantities and leave it to end-users to discretize the quantities into segmentations. It is also common to publish reference data sampled to one or more public templates, rather than publishing yet another template.

So we have templates mapping many-to-one onto spaces, and atlases mapping many-to-many onto templates:

flowchart TD
    a[Atlas A] & b["Atlas B"] --> x["Template X"] & y["Template Y"] & z["Template Z"]
    c[Atlas C] --> z
    x & y --> s[Space S]
    z --> t[Space T]

I think it is reasonable to say that the collection of files that are distributed as a template also qualify as atlases, but separating out the atlas concept permits a collection of values to be abstracted from any one particular space. A reference collection of binding potentials is useful to have sampled onto a collection of commonly-used volumes and surfaces.

So this BEP narrowly targets atlases in this sense, and punts on the question of templates, leaving the possibility of them being a separate BIDS concept open. I do not think we can just rename atlas- to tpl- and get as close of correspondence to people's actual usage of atlases and templates. The PS13 PET atlas was used as a testing ground for iterations of these ideas. Looking at the evolution of that dataset, I think the file naming has become clearer as this effort (and PET-derivatives) has evolved, but the structure of TemplateFlow would not really work here.

Nor do I think atlas- covers the same ground as tpl- in TemplateFlow, so the acceptance of this BEP would not mean that TemplateFlow needs to adopt atlas- and live with an awkward-fitting term. Looking in templateflow/tpl-MNI152NLin2009cAsym we have DiFuMo and Schaefer2018 atlases sampled onto MNI152NLin2009cAsym, which again seems to demonstrate the usefulness of splitting these two concepts as we have.

(ii) raise awareness and caution over a problem that will bite sooner or later, emerging from the inconsistency of the foundations of this BEP with the current specs about standard spaces and coordinates, and with future BEPs such as the transformations, where templates will be required as "first class" citizens of BIDS (a bit like subjects) to identify the origin and destination of the mathematical operations involved.

The way I see things moving is that we have DatasetTypes of "raw" and "derivative" for individual studies. This BEP introduces "atlas", where atlas- replaces sub- and the central concept is the quantities while the reference subject secondary. And a template BEP (possibly BEP14, IDK) would introduce "template", and tpl- would replace sub- and the central concept is the geometry of the reference subject, and the quantities used to represent it are secondary.

CPernet commented 9 months ago

this is great @effigies indeed templateflow/tpl-MNI152NLin2009cAsym is a template -- we have the change of definition for atlas in the fork yml file

oesteban commented 9 months ago

I'm not sure if we ever laid out definitions of terms in the results of the Copenhagen meeting (someone please link them, if we did). "Atlas", "space" and "template" are pretty overloaded and often used interchangeably, so we tried to tease out these differences in a way that would mostly match people's usages. Here is my reconstruction of these terms:

* **Space** - A common coordinate frame with an origin, axis orientations and spatial units; in practice this is too abstract to be directly usable.

* **Template** - an canonical anatomical reference in a particular space; while two templates may be in the same space, the template matters for the association of coordinates with structures, so when something is said to be resampled into a space, the template used is the more useful name.

* **Atlas** - A reference quantity sampled to a template. In many contexts this is a parcellation, but it in some communities (notably PET) it is more common to publish absolute quantities and leave it to end-users to discretize the quantities into segmentations. It is also common to publish reference data sampled to one or more public templates, rather than publishing yet another template.

Thanks, this clarifies quite a lot. The only "but" I can add is bringing back the argument responding to @melanieganz that in my opinion, this:

I think one misunderstanding in the (often only MRI-based way) of thinking is that it ignores the case when we distill new information about the brain and create new atlases. Those new atlases can be derived in multiple ways, e.g., in PET by averaging single subject derivatives such as BPnd maps to create an average map that via a calibration with autoradiography can be turned into an actual quantitative map showing receptor density. These are the type of atlases that not only show regions in space, but also quantitative information about the brain - just like the big brain atlas from Juelich that's single subject based.

IMHO, this matches the definition of template as per Chris above. I probably am not part of the "PET community" anymore, but back in the day, I would call that a template in the lab I was in Barcelona *.

With reference to the PS13 atlas, I think those PET maps would be split into two types of files**:

Feature maps: which do not need the atlas-<label> identifier unless name collisions. The specific type of map is given by the suffix, and perhaps some entities.
Atlas annotations: these are dsegs, probsegs, tables, statistical maps, etc. that do require an atlas-<label> to be defined.

This is important because templateflow gives a solution to the real problem here, which is giving unique identifiers with a controlled vocabulary to the spaces to which things can be referenced. This is critical.

I do not think we can just rename atlas- to tpl- and get as close of correspondence to people's actual usage of atlases and templates.

I think the reverse way, and do not think the structure of PS13 on openneuro reflects how templates and atlases have traditionally been organized and redistributed.

Looking in templateflow/tpl-MNI152NLin2009cAsym we have DiFuMo and Schaefer2018 atlases sampled onto MNI152NLin2009cAsym, which again seems to demonstrate the usefulness of splitting these two concepts as we have.

This demonstrates that atlas is always contingent upon having defined the space in which it is defined. And these spaces have traditionally been generated by averaging some feature on a manifold (grid, mesh, graph, etc.)

As such, I don't see how templateflow can accommodate atlases with the extra directory. I do agree templateflow is lacking much of the work in this BEP to describe atlases, and we haven't worked more on that because of a lack of resources. I'm sure the metadata descriptions will be very helpful to implement atlases querying in it.

My point can be resumed as follows: I can't understand an atlas if I don't understand its space where it is defined. Since space is such an abstract construct, I use a template as the source of stereotaxy to understand the space and hence be able to interpret the knowledge encapsulated by the atlas.

All the metadata in this BEP - big fan of it, it should go into BIDS Derivatives in one way or another.
Giving atlases first-class structure like subjects - I don't get it, and the way of understanding it for me would be to find something that this BEP wants to implement that cannot be encoded within templateflow's infrastructure today.

EDIT: Added footnotes:

* (working with PET and SPECT) ** (within templateflow)

effigies commented 9 months ago

Giving atlases first-class structure like subjects - I don't get it, and the way of understanding it for me would be to find something that this BEP wants to implement that cannot be encoded within templateflow's infrastructure today.

I think it's the modularity that's tricky with templateflow. This will be a little long, and I apologize, but here goes:

In templateflow, everything is tpl-XYZ, both the first party template and any third-party quantitative maps or segmentations that are sampled onto the stereotaxic space for use with that template.

Suppose I'm distributing an atlas I want to call PS13, and it's just a mean/stddev map that has been warped to several templates for wide reuse. I believe the templateflow approach would be to write:

tpl-MNI152NLin2009cAsym/
    tpl-MNI152NLin2009cAsym_res-2_atlas-ps13_stat-mean_mimap.nii.gz
    tpl-MNI152NLin2009cAsym_res-2_atlas-ps13_stat-std_mimap.nii.gz
tpl-fsaverage/
    tpl-fsaverage_hemi-L_den-164k_atlas-ps13_stat-mean_mimap.func.gii
    tpl-fsaverage_hemi-L_den-164k_atlas-ps13_stat-std_mimap.func.gii
    tpl-fsaverage_hemi-R_den-164k_atlas-ps13_stat-mean_mimap.func.gii
    tpl-fsaverage_hemi-R_den-164k_atlas-ps13_stat-std_mimap.func.gii

However, what's in these directories is not a template, it's a spatial map, and the contents here are different from what would be in canonical tpl-* directories containing templates. I wouldn't want to make a full copy of the volume/surface files; that would invite tpl-X_T1w.nii.gz somehow going out of sync (which I believe happened with one of the major tools that redistributed templates).

Further, a user might want to select a particular template to work with, but then mix-and-match from several independently distributed atlases. Do they combine them into one directory?

If instead I distribute my atlas with the name of the quantities contained in that atlas as the first-class object:

atlas-ps13/
    atlas-ps13_space-MNI152NLin2009cAsym_res-2_stat-mean_mimap.nii.gz
    atlas-ps13_space-MNI152NLin2009cAsym_res-2_stat-std_mimap.nii.gz
    atlas-ps13_space-fsaverage_hemi-L_den-164k_stat-mean_mimap.func.gii
    atlas-ps13_space-fsaverage_hemi-L_den-164k_stat-std_mimap.func.gii
    atlas-ps13_space-fsaverage_hemi-R_den-164k_stat-mean_mimap.func.gii
    atlas-ps13_space-fsaverage_hemi-R_den-164k_stat-std_mimap.func.gii

Now I have a collection of maps that are conceptually linked in the same directory, and I can instantly see that they differ by the space they're sampled in, the hemisphere/density/resolution and the statistic.

As a user, I would combine my atlases with my templates by placing them alongside one another and my study data as subdatasets:

analysis/
    atlases/
        atlas-ps13/
        atlas-Schaefer2018/
    code/
    figures/
    templates/
        tpl-MNI152NLin2009cAsym/  # Template dataset
    sourcedata/
        ds00WXYZ-fmriprep/  # Derivative dataset
    ...

Each externally sourced dataset is able to be pulled in as an independent module, without requiring the pieces to be combined.

Now suppose we want to extend this to templateflow. We could write:

tpl-MNI152NLin2009cAsym/
    atlases/
        atlas-DiFuMo/
            atlas-DiFuMo_space-MNI152NLin2009cAsym_res-02_desc-1024dim_probseg.nii.gz
            ...
        atlas-Schaefer2018/
            atlas-Schaefer2018_space-MNI152NLin2009cAsym_res-01_desc-1000Parcels17Networks_dseg.nii.gz
        ...
    tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz
    ...

Alternately, templateflow could have both templates and atlases sampled to those templates:

atlases/
    atlas-DiFuMo/
        atlas-DiFuMo_space-MNI152NLin2009cAsym_res-02_desc-1024dim_probseg.nii.gz
        ...
    atlas-Schaefer2018/
        atlas-Schaefer2018_space-MNI152NLin2009cAsym_res-01_desc-1000Parcels17Networks_dseg.nii.gz
        ...
templates/
    tpl-MNI152NLin2009cAsym/
        tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz
        ...

In addition to making selection of pieces simpler, I think the modularity here would match how these files are actually produced: MNI generated their template, and then the atlases were sampled (either by the original authors or by you) onto that template.

I agree that an atlas cannot be interpreted without some template (so it would be reasonable for space- to be a mandatory entity) and it's possible that some atlases cannot be validly resampled to alternative templates, but some can be.

oesteban commented 9 months ago

Suppose I'm distributing an atlas I want to call PS13, and it's just a mean/stddev map that has been warped to several templates for wide reuse. I believe the templateflow approach would be to write:
tpl-MNI152NLin2009cAsym/
    tpl-MNI152NLin2009cAsym_res-2_atlas-ps13_stat-mean_mimap.nii.gz
    tpl-MNI152NLin2009cAsym_res-2_atlas-ps13_stat-std_mimap.nii.gz
tpl-fsaverage/
    tpl-fsaverage_hemi-L_den-164k_atlas-ps13_stat-mean_mimap.func.gii
    tpl-fsaverage_hemi-L_den-164k_atlas-ps13_stat-std_mimap.func.gii
    tpl-fsaverage_hemi-R_den-164k_atlas-ps13_stat-mean_mimap.func.gii
    tpl-fsaverage_hemi-R_den-164k_atlas-ps13_stat-std_mimap.func.gii

It depends, because it looks like tpl-MNI152NLin2009cAsym_res-2_atlas-ps13_stat-mean_mimap.nii.gz is a feature map, the same way _T1w is a feature map. As a feature map, it looks like a PET scan and can be used to map individuals' PET scans of the same radiotracer. If that is the case, this would be TemplateFlow's representation:

tpl-MNI152NLin2009cAsym/
    tpl-MNI152NLin2009cAsym_res-2_PET.nii.gz
    tpl-MNI152NLin2009cAsym_res-2_atlas-ps13_stat-std_mimap.nii.gz
tpl-fsaverage/
    tpl-fsaverage_hemi-L_den-164k_PET.func.gii
    tpl-fsaverage_hemi-L_den-164k_atlas-ps13_stat-std_mimap.func.gii
    tpl-fsaverage_hemi-R_den-164k_PET.func.gii
    tpl-fsaverage_hemi-R_den-164k_atlas-ps13_stat-std_mimap.func.gii

The stat-std remains an atlas as a voxel/vertex-wise annotation of a property, but that property does not tell you much alone without the _PET map or the _T1w map for spatial reference.

I will be honest and say that there's a problem here for TemplateFlow. At the moment, you cannot tell from the structure above that the _T1w template was done by MNI and the _PET template was done by the PS13 authors. You can write it in the metadata, but then it is not obvious. I'm not trying to sell TemplateFlow as the solution as it is.

As a user, I would combine my atlases with my templates by placing them alongside one another and my study data as subdatasets:
analysis/
    atlases/
        atlas-ps13/
        atlas-Schaefer2018/
    code/
    figures/
    templates/
        tpl-MNI152NLin2009cAsym/  # Template dataset
    sourcedata/
        ds00WXYZ-fmriprep/  # Derivative dataset
    ...

Sure, but by the same reasoning, we could change fMRIPrep to generate:

sourcedata/
    ds00WXYZ-fmriprep/  # Derivative dataset
        segmentations/
            sub-01_<...>_space-1_<...>
            sub-01_<...>_space-2_<...>
            sub-02_<...>_space-1_<...>
            sub-02_<...>_space-2_<...>
        transforms/
            sub-01_<...>_space-1_<...>
            sub-01_<...>_space-2_<...>
            sub-02_<...>_space-1_<...>
            sub-02_<...>_space-2_<...>
         timeseries/
            ....

This structure would feel way more natural than our current organization to many researchers because it better reflects their interests. I think this problem goes beyond this particular BEP, as it seems BIDS is trying to accommodate all possible derivatives by allowing a lot of flexibility. IMHO that goes against the original principles of BIDS, and there exists NIDM which (IMHO again) is a much better tool for this particular problem.

That said, I also think templates are special: sure, they are derived data, but the way we integrate them into analyses and visualizations corresponds, in reality, to raw data. Under this view, I think it is positive that templates (and the atlases defined in correspondence) are as close as possible to BIDS-raw as opposed to BIDS-Derivatives. This also would justify why I think NIDM could be an overkill for templates, but it is definitely not for derivatives.

Also, as an argument in favor of approximating this to BIDS-raw as opposed to BIDS-Derivatives, templates are not generated so often (perhaps with the exception of researchers who employ study-wise or custom in-house templates and from there map to a more widely used space), and atlases even much less often.

Finally, and to end on a positive note again:

IMHO the value of the BEP is in the new metadata specifications, which are outstanding and I'm looking forward to adopting them within TemplateFlow.
My concern is the file structure, which I anticipate will be misused.

oesteban commented 9 months ago

A few additional thoughts, without particular order:

It confused me that the Google Doc is the main thing found in the first message. Today, someone has pointed me to https://github.com/PESTILLILAB/bids-specification/blob/bep038/src/atlas.md, which apparently is the current working document. AFAICT, this is not the prescribed procedure for BEPs, and precludes access from new contributors. In my case, much of what I got from the Google Doc was totally out of date.
In light of the tortuous fate and termination by the BIDS Steering Group of the model-<label>/ PR (and later refinements such as result-<label>/), I would also avoid adding depth to BIDS folders structures or else I will need to come to think that those proposals are evaluated more on the proponents than they are on the content.
Finally, one thing this BEP is not properly capturing is the goal. Reflecting further on my previous comment that templates (and associated atlases) could be considered as "raw" specs, I've realized it is not exactly that. The question is that there are two major use-cases for templates (and atlases):
1. The template and/or atlas generation IS the analysis. When the goal of the researcher is to create a new template/atlas, then that becomes the endpoint. Accordingly the objective/challenge for reproducibility is - given the same original data, I should be able to generate the same files myself. If the template is created out of a single subject, I think BIDS-Derivatives already captures most of what is needed. If the template is generated from several subjects, then we have a "group level" analysis type, which would require some notion of group within BIDS-Derivatives (unsure if it already exists). This would also be the case for "custom" or "study-wise" templates/atlases without the intent of being used beyond the study.
2. Template/atlas is an input that a certain analysis: the traditional use of MNI as a standardize space of reference. Here the goal is very different and it is what TemplateFlow tries to cover. Often, this use case starts as item (i) and the template/atlas becomes popular and transitions to (ii) - and exemplifying this, templateflow transformed all the original names of the original MNI templates in this adaptation. However, the original papers and work the MNI authors did to generate the templates/atlases would have been served by the BIDS-Derivatives specs as opposed to specific specs.

IMHO - 1 does not require a special new folder structure, and for 2 I have already spilled too much ink. Both use-cases have the necessary and recommended metadata in common.

PeerHerholz commented 9 months ago

Hello @oesteban and everyone,

I'm very sorry for joining the discussion so late, I was AFK. Instead of replying/adding something to all messages before, I'll only address the last comment and try to summarize things.

It confused me that the Google Doc is the main thing found in the first message. Today, someone has pointed me to https://github.com/PESTILLILAB/bids-specification/blob/bep038/src/atlas.md, which apparently is the current working document. AFAICT, this is not the prescribed procedure for BEPs, and precludes access from new contributors. In my case, much of what I got from the Google Doc was totally out of date.

Just to clarify the respective points:

The first message only contains the GoogleDoc as we (the core contributor team) didn't want to edit a post that's not from us. However, we added notes to the draft and the planned approach on multiple occasions (e.g., here and here).
Furthermore, the plan was the write the draft based on the GoogleDoc and discussions during the meetings (here and here), go through a couple of feedback rounds between us (ie the core contributors) and then align the draft and GoogleDoc and only then, after having a complete initial draft (and alignment), open it up for wider feedback rounds. At this point we would also have added a note in the GoogleDoc"freezing" it for the time being and stating that the development will continue on GitHub with respective links. The wider feedback round unfortunately happened to soon before we had the chance to finish the requirements.
As far as we know and can tell, we always followed the BEP development guidelines strictly since the beginning, as we wanted to make sure the process aligns with the guidelines and suggested workflows. We openly asked everyone if the requirements to advance to the next steps/stages were fulfilled (e.g., here and here). We also had an open meeting with several people from multiple stakeholder groups to discuss if we can advance to the GitHub stage. Thus, it was our understanding that we followed the BEP development guide.
We, as any other BEP I think, were/are afraid of losing folks when transitioning to GitHub and preventing new ones from joining this endeavor. However, interested folks can still add comments to the GoogleDoc draft and here as comments. I think, at some stage, ie when things get more technical, there will always be a drop-out. I'm not a fan of this by any means and think there should be way less hurdles and more efficient ways in place for (new) folks to join.

In light of the tortuous fate and termination by the BIDS Steering Group of the model-/ PR (and later refinements such as result-/), I would also avoid adding depth to BIDS folders structures or else I will need to come to think that those proposals are evaluated more on the proponents than they are on the content.

Sorry, I'm not sure if I understand what you are referring to. Could I maybe ask you to explain this further? Do you mean that the atlas- use cases outlined so far would add too much depth? Sorry again.

Regarding your last point and maybe the entire discussion so far (which, while definitely very important and informative, might already be too complex, detailed and hard to follow for a lot of folks): what would be the way forward? Evaluating if/how the different use cases could be implemented via the different options (e.g. our BEP and combining aspects of our BEP and templateflow, sorry if I misunderstood these options)?

Thank you all very much for all your feedback and input, that's highly appreciated and tremendously helpful to further develop this effort.

oesteban commented 9 months ago

Thanks @PeerHerholz

Just to clarify the respective points:

Thanks for the clarification, I definitely needed a walk-through. I definitely didn't mean there's a lack in interest or effort to maintain the discussion accessible to everyone - I was just flagging that the conversation is not easy to be followed. Perhaps a few simple steps would close the gap:

Marking the Google Doc read-only and add a warning in the first message saying that it has been abandoned. Having both routes open is unsustainable, as the effort to keep them in sync is excessive. If we are not ready to transition, then the conversation should entirely happen within the Google Doc. Successful BEPs have all transitioned from GDocs to a PR on GitHub without intermediate steps. Here, the draft document is an intermediate step that does not conform to the typical lifecycle of a BEP. A related problem is that it is hosted outside this repo, and therefore we lose all the GitHub niceties of displaying sections of code, suggestions, etc.
Copying your points in the original message
Add an admonition indicating where the latest proposal is maintained and how suggestions can be made.

I'm sure @jdkent will not oppose updating his original message so that it is current and we avoid confusion. Alternatively, perhaps it would be best to open a Draft PR and move all the conversations there if editing the first message is not a possibility.

Sorry, I'm not sure if I understand what you are referring to.

I'm referring to #1280. Much of the criticism concentrated on the fact that the proposal added a results-<label>/ (originally it was model-<label> but that hit even more criticism for the keyword choice, fair enough) folder. Apparently, that was going to complicate the validator and add more depth to the overall structure. In the end, the steering committee decided to find flatter solutions.

what would be the way forward?

To answer that question I will need to read the draft carefully. I started from the Google Doc and the draft is way more advanced. I also think @effigies' description of the draft was very helpful.

By quickly looking at the spec:

Case 1 - has two problems:

Files stored at the BIDS root level
- These files do not belong here, they are the outputs of a previous analysis step (by yourself or by someone else) that feeds as input to this one rather than analysis' outputs.

First, a given atlas underwent modifications before its utilization, specifically spatial transformations to a template space

When we transform stuff to an individual subject's space, we write it under the appropriate <datatype>/ (dwi, func, etc, not necessarily raw), we store it under that subject's folder. Here, when we transform to a template space, we store it under an atlas folder (??).

Therefore, this part:

<dataset>/atlas/atlas-<label>/
atlas-<label>_desc-<label>_[dseg|probseg|mask].[nii|dscalar.nii|dlabel.nii|label.gii][.gz]
atlas-<label>_desc-<label>_[dseg|probseg|mask].tsv
atlas-<label>_desc-<label>_[dseg|probseg|mask].json

talks about a pre-existing dataset - should fall out of this case (and potentially, the BEP). It is equivalent to saying that the original BIDS folder will be copied somewhere in the derivatives.

The second part is the result of a pipeline, where some atlas inputs are derived into a different space:

<dataset>/derivatives/<pipeline>/
atlas-<label>_space-<label>_res-<label>_desc-<label>_[dseg|probseg|mask].[nii|dscalar.nii|dlabel.nii|label.gii|.tsv][.gz]
atlas-<label>_space-<label>_desc-<label>_[dseg|probseg|mask].tsv
atlas-<label>_space-<label>_desc-<label>_[dseg|probseg|mask].json

While for individual/native space we have the proxy concept of subject, here template for a group space is overshadowed by atlas. Instead, I think this should be (IMHO):

<dataset>/derivatives/<pipeline>/
    atlas-<label>_desc-<label>_[dseg|probseg|mask].tsv   # These will generally be shared across templates/spaces
    atlas-<label>_desc-<label>_[dseg|probseg|mask].json
    tpl-<spacelabel>/
        tpl-<spacelabel>_atlas-<label>_res-<label>_desc-<label>_[dseg|probseg|mask].[nii|dscalar.nii|dlabel.nii|label.gii|.tsv][.gz]
        tpl-<spacelabel>_atlas-<label>_desc-<label>_[dseg|probseg|mask].json # This one requires specific meta
    tpl-<space2label>/
        tpl-<space2label>_atlas-<label>_res-<label>_desc-<label>_[dseg|probseg|mask].[nii|dscalar.nii|dlabel.nii|label.gii|.tsv][.gz]

Case 2 - has problem 1 above and a different issue, for some reason, atlas disappears while we have a space-<label> which is native?:

<dataset>/derivatives/
    sub-01/
        func/
        sub-01_space-<label>_seg-<label>_[dseg|probseg|mask].[nii|dscalar.nii|dlabel.nii|label.gii|tsv][.gz]
        sub-01_space-<label>_seg-<label>_desc-<label>_[dseg|probseg|mask].tsv
        sub-01_space-<label>_seg-<label>_desc-<label>_[dseg|probseg|mask].json

IMHO, this should be (adding two examples - one for the native space and one for a different space):

<dataset>/derivatives/
    atlas-<label>_desc-<label>_[dseg|probseg|mask].tsv   # These will generally be shared across subjects
    atlas-<label>_desc-<label>_[dseg|probseg|mask].json
    sub-01/
        func/
            sub-01_atlas-<label>_[dseg|probseg|mask].[nii|dscalar.nii|dlabel.nii|label.gii|tsv][.gz]
            sub-01_atlas-<label>_desc-<label>_[dseg|probseg|mask].tsv
            sub-01_atlas-<label>_desc-<label>_[dseg|probseg|mask].json
            sub-01_space-<label>_atlas-<label>_[dseg|probseg|mask].[nii|dscalar.nii|dlabel.nii|label.gii|tsv][.gz]
            sub-01_space-<label>_atlas-<label>_desc-<label>_[dseg|probseg|mask].tsv
            sub-01_space-<label>_atlas-<label>_desc-<label>_[dseg|probseg|mask].json

Case 3, IMHO, is just standard derivatives. fMRIPrep generates a T1w template (atlas?) with all the T1w images found in the dataset.

Rather than add an atlas-<label>/ structure level, I belive BIDS really needs some equivalent to subject (sub-) for groups (for group-level analyses) and for spaces (arguably, could be done with groups). In templateflow, we chose template (tpl-) because having space- was too abstract and would in a way conflict with the concept of individual/native space of subjects. So it seemed more tangible and understandable than space.

CPernet commented 9 months ago

feels like you are totally missing the point of case 1 - a derivative dataset type, these can live on their own for a while now

oesteban commented 9 months ago

feels like you are totally missing the point of case 1 - a derivative dataset type, these can live on their own for a while now

No, I'm not missing anything. Perhaps my argument wasn't clear.

If the modified template (atlas) was calculated along the pipeline processing, the same way fMRIPrep runs FreeSurfer and stores FS' derivatives outside its own directory, the pipeline of this analysis would store the derived template (atlas) outside its own folder structure. In this situation, the derived template (atlas) is just an ordinary derivative set, so imposing a structure different from other derivatives feels forced and unnecessary.
This modified template (atlas) can eventually become worth sharing as input for future or someone else's analyses. In this case, the data becomes "raw"-like (and BIDS-raw does not accommodate these at the moment). I proposed to Chris G. a spec on templates veeery long ago and he rejected it. At the moment, it stung, but over time, I've finally understood Chris and think similarly. BIDS-raw tells you how to organize experimental data -- organizing other sources of information is not within scope (and, honestly, TemplateFlow or AFNI's template infrastructure both do a really good job at redistributing templates and atlases).

CPernet commented 9 months ago

'This modified template (atlas) can eventually become worth sharing as input for future or someone else's analyses. In this case, the data becomes "raw"-like (and BIDS-raw does not accommodate these at the moment)'

but that's my point - the spec currently does already accommodate this

bids-standard / bids-specification

BEP Proposal: Atlas specification #1281

Your idea

EDIT