bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
265 stars 154 forks source link

BIDS-like organization for Atlas/Library #1697

Open TheChymera opened 5 months ago

TheChymera commented 5 months ago

The ABI mouse brain maps for gene expression and connectivity (well, projection) are a particularly useful resource for neuroimaging, where they can help correlate whole-brain maps with cellular/molecular characteristics. Sadly, they're published via an API which takes a bit of time to understand and come as NRRD without spatial information and without being registered to a brain imaging template.

I have a while ago constructed a bunch of scripts to handle download, NIfTI-fication, registration, etc. The archive I have produced and used thus far looks like this: https://gin.g-node.org/TheChymera/ABI-connectivity-data_generator/src/master/procdata

It's very bare-bones and requires reading in a custom XML to properly interpret. I was thinking a BIDS-like style would perhaps make it easier for others in neuroimaging to leverage this resource. This is what I have so far: https://gin.g-node.org/TheChymera/ABI-connectivity-data_generator/src/master/bids

I was wondering if anybody else is interested to chime in. I know there is BEP038 which deals with “atlases”, and to me this library is very much an atlas, but I'm not sure this was the vision for the BEP, it seems to assume as far as I can tell that an atlas is one file. Sure, all the maps can be concatenated along a fourth dimension, but that would just end up being a gigantic file, with a gigantic JSON to properly make sense of the positional information in the fourth dimension.

Just for context, if you're wondering, this specific atlas is a collection of histological projection maps based on an injection site and an expression pattern, so you get e.g. a map of projections of different cell types from the VTA here → https://gin.g-node.org/TheChymera/ABI-connectivity-data_generator/src/master/bids/seed-VTA

@yarikoptic

Also, @dyf , what do you think about this? I think I mentioned it to you at the ODIN meeting, I think it would make your data more accessible, though I'm wondering what you think is really important from the XML and should be created a filed for in the JSON sidecar. Maybe this exercise could be relevant to some of the points raised here.

effigies commented 5 months ago

I know there is BEP038 which deals with “atlases”, and to me this library is very much an atlas, but I'm not sure this was the vision for the BEP, it seems to assume as far as I can tell that an atlas is one file.

An atlas is not necessarily a single file, but a collection of related files. What are the actual contents of these files? They say FLUO, are they microscopy images, or masks/probabilistic segmentations derived from microscopy images?

TheChymera commented 5 months ago

They are fluorescent microscopy data reconstructed from brain slices. That's already the first snag, because FLUO doesn't currently support .nii.gz, but in this case it's FLUO for use in neuroimaging.... so I think it makes sense.

There's no segmentation, one file is one feature, i.e. the projections of one cell type from one brain area.

effigies commented 5 months ago

That's already the first snag, because FLUO doesn't currently support .nii.gz, but in this case it's FLUO for use in neuroimaging.... so I think it makes sense.

Okay, so there would need to be some agreement that volumes reconstructed from microscopy are valid for NIfTI. Or possibly a more general statement that data may be converted among any BIDS-supported file formats as a derivative, in order to facilitate inter-modality analyses.

In the current framework, it might be reasonable to reconstruct these as .ome.zarr files, from which NIfTI would be a pretty simple conversion for a pipeline that needed the data in NIfTI.

Your seed-<label> and expression-<label> entities would also need to be proposed or mapped onto existing concepts. It's possible that label-<seedlabel> would work, but it's a kind of awkward fit.

If we allowed that all of these exist, then the BEP38 addition would just be an explicit atlas name:

ds/
  atlas-atlasName/
    atlas-atlasName_seed-ACAd_expression-<label>_FLUO.nii.gz
    atlas-atlasName_seed-ACAd_expression-<label>_FLUO.json
    ...
TheChymera commented 5 months ago

@effigies ok, I can adapt it more to BEP038 if you think it could fit the concept. What's the status on the BEP, is it almost finalized/dead? The last comment seems to have gotten no response in a while.

effigies commented 5 months ago

It's quite active and nearly finalized: https://github.com/bids-standard/bids-specification/issues/1281

TheChymera commented 5 months ago

From #1281:


1

I don't think BEP038 supports custom fields (and maybe it shouldn't), like the seed- and expression- fields relevant for this data.

_expression is probably not related to the atlas BEP at all and rather worth checking in a separate issue with clear description on the use case.

_seed -- that relates to connectivity https://bids.neuroimaging.io/bep017 -- please check how would be expressed there?

for a workaround, as you mention, _desc is more for a "derivative data" so not a good match. May be smth like _acq- which people typically abuse for such purposes to provide additional detail on MRI acquisition would be the better one?


2

@yarikoptic

May be smth like _acq- [for the seed]

I think that's an even worse idea than desc-, since it really has nothing to do with the acquisition. If anything acq- could describe some protocol shorthand for the actual fluorescent imaging method, but I'm not sure that's relevant enough to occupy a filename field.

bep017

Well, that doesn't introduce a seed- key-value pair, in fact it only makes use of the term “seed” to explain in more understandable terms in the BEP text what is explained more cryptically in the JSON sidecars. Perhaps it makes sense for that BEP specifically (I actually don't think so and commented on the doc to that effect), but it certainly wouldn't help us here because:

The other thing from BEP017 that I could leverage is the _relmat suffix. In a sense it might be more informative than _FLUO, since the modality is a poorer descriptor of the data than the fact that it represents relationships between all brain voxels and a purported specific structure. The other issue with that is that in BEP017 seed-based connectivity data is described as a derivative, which it is. This atlas data, however, does not need to be a derivative in that sense, it's primarily a reformatting of NRRD/XML data to NIfTI/JSON.

@effigies do you think this would be a good addition to the BEP, i.e. determining what the extension would look in light of connectivity atlases from different modalities, deciding whether to keep the modality suffix or something more generic like _relmat? If so I think a seed- key-value pair would be a good introduction, might help both this BEP and BEP017.