bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
279 stars 163 forks source link

Conversion Implementations Require Concrete Examples #1350

Open neurolabusc opened 2 years ago

neurolabusc commented 2 years ago

Many extensions with to the BIDS format create a wish list of useful sequence details without clear instructions regarding how to implement these. It would really help to have sample DICOM files from the major manufacturers as well as an example of the desired BIDS conversion. This would help automate conversion support, identify manufacturer specific terminology and also help identify missing details that can be addressed by the relevant DICOM working groups.

While this is a general concern, my issue was elicited by dcm2niix issue 646. @agahkarakuzu you have supplied a desired BIDS conversion, but it does not contain a source DICOM. The current specification seems underspecified. It is unclear how to determine stimulated echo (STE) for various manufacturers (GE, Philips) as well as different systems (Siemens VE11 Classic DICOM vs Siemens XA30 enhanced DICOM). I would suggest a model like the dcm_qa repositories, that provide reference datasets that can aid developers for all the popular conversion tools.

Ideally, it would be great if new specification generation could happen as a virtuous cycle with conversion software development. This would allow the specification to be refined to capture edge cases. I would be happy to have BIDS extension designers submit pull requests to dcm2niix (C). Alternatively, specification creators could develop dicm2nii (Matlab) or dicom2nifti (Python) if that is their preferred language. A reference dataset and software implementation can help avoid a lot of unintended consequences.

The reference datasets also have other benefits beyond tool conversion. For example, as long as the user has the correct licenses, most Siemens users can Phoenix a DICOM sequence from another scanner onto their system to easily clone a sequence.

neurolabusc commented 2 years ago

@Remi-Gau I do not agree with the more restricted title. For my specific case of developing DICOM support for BIDS my concerns apply to all medical images (MRI, PET, CT, SPECT, etc). However, I think this applies to other domains like MEG and ERP and physiologiocal recordings where the manufacturers have their own data formats that must be converted to standards.

Creating data standards that include a description and an example of the desired structured data without the source original data places a heavy burden on users and knock on effects on developers, and does not benefit from insight gained by explicitly mapping the desired terminology to that used by the manufacturer (e.g. different manufacturers use the DICOM tag Repetition Time differently for some sequences).

Remi-Gau commented 2 years ago

Ah yes sorry. I was too hasty in the renaming.

I tried to give your title a slightly less broad scope but I swang a bit too much the other way.

In an ideal world I would definitely prefer most impletementations that are BIDS related to come with examples but for the sake of this issue I would at least restrict it to be about conversion implementation.

Remi-Gau commented 2 years ago

FWIW I do agree with your feeling on this and I think we should try to push BIDS extension (when they are about raw data) to come with conversion examples.

neurolabusc commented 2 years ago

@Remi-Gau I do appreciate your feedback as it helped me fully describe my concerns.

agahkarakuzu commented 2 years ago

@neurolabusc I am aware of this problem and I definitely agree on the necessity of providing source files and conversion mappings from the get go.

I also agree that for certain qMRI acquisitions, BEP001 is a wish list of specifications in its current state. But it was the best first thing we could do to start somewhere. I am trying to support users one request at a time to solve the issue for their specific setup. When developing sequences, I conform to the tags in the wish list and we suggest WIP sequence developers to follow that as well. For a while, it is somewhat a cold start problem, but I hope we'll get there.

Each qMRI-focused WIP sequence has shortcomings in adding important tags to metadata, e.g., RepetitionTimeExcitation for MP2RAGE can be found only in the protocol pdf, not in the file header. Then XA30 came and things got even more complicated for most of the sequences. Not to mention that the complexity is multiplied with each vendor.

For TB1EPI specific issue, I tagged people from our team (in the respective issue) who may have the source data.

neurolabusc commented 2 years ago

@agahkarakuzu perhaps you can send me some V* MP2RAGE DICOMs where the RepetitionTimeExcitation is known. I assume that these values are stored in the alFree and adFree variables of the CSA header. These sequence specific values allow users to phoenix custom sequences, but we need to know the mapping.

For Siemens, you can work with your center's Siemens Research Collaboration Manager to understand these details and make sure they have some equivalent in a future XA release. Likewise, Philips users can work with the Philips Clinical Scientist associated with their center to understand these details (albeit Philips DICOM scans do not store many sequence specific parameters). For GE, @mr-jaemin has done a terrific job of helping us understand how GE nodes various details. I do think it really helps if the BIDS BEPS are developed in consultation with the manufacturers. Scanner software releases have a long lag time, and DICOM working groups have an even longer lag. Including those engineers early in the cycle allows them to provide their insight and allows them to provide the rich sequence details we want for reproducible science.

arokem commented 2 years ago

I am trying to wrap my head around the scope of this issue: what data types and modalities are affected by this? It sounds like at least qMRI measurements are affected by this, but is that the extent of the issue? Or does this affect other MRI measurements (e.g., fMRI/dMRI?) and other modalities? Sounds like the scope of this issue could get quite complicated with the wealth of different human electrophysiology vendors and file formats. Does it hit us there as well?

neurolabusc commented 2 years ago

@arokem I think this has been a pervasive issue with modality specific BEPs. When the original BIDS was drafted, @chrisgorgo spent a lot of time coordinating with people like me to work out both what was required for thorough analysis but also what was available from the existing data. However, it seems like subsequent BEPs have been drafted that are based on wish lists of what is required for thorough analysis, but no concrete images are provided and no clear consideration for how these fields can be derived.

For many of the BEPs, I have tried to make comments during the development to try to correlate proposed BIDS fields with DICOM tags. However, I think implementation considerations need to intregrated with the development of these specifications.

CPernet commented 2 years ago

@neurolabusc this is not completely true, I gave you access to a library of many PET phantoms (GE, Phillips and Siemens) -- we just haven't come around to making it public yet. It is partially true though because this is not real validation for DICOM (for ecat we make one from scratch)

neurolabusc commented 2 years ago

@CPernet my request is that the BIDS Specification Extensions be done in parallel and in collaboration with implementation considerations. I have consistently requested sample datasets from teams that have access hardware and licenses that I do not have direct access to. Where possible, I have worked to make publicly accessible validation datasets that allow all conversion tools to have concrete exemplars.

CPernet commented 2 years ago

sure I get your point -- kinda weird having spec describing stuff no one has ever seen!

robertoostenveld commented 2 years ago

I share the concerns expressed here, but think that they may be more specific to imaging modalities than to others. Let me share some background regarding MEG/EEG/iEEG for which I co-authored the BEPs, and also with me being a "conversion" tool developer with FieldTrip data2bids.

For MEG we identified that there was no uniform file format that would easily accommodate all existing data. Reformatting complicated MEG formats into other complicated formats was not considered attractive. We therefore decided to specify that the (at that time) existing commercial/native file formats to be used in BIDS. Now that OPM-based MEG systems gain wider adoption, this may need to be reconsidered, although fif (one of the currently BIDS supported file formats) is a good candidate for OPM systems. But for now MEG does not require conversion.

For EEG we identified that there were few file formats commonly used in research settings that all data could be converted to, mainly EDF (16 bit) and BrainVision (16 or 32 bit integer and 32 bit float). EEG files usually have little metadata, and converting to them is not hard. There are many input file formats for the converters though, and those are hard to deal with - mostly because those are proprietary and require some reverse engineering.

for iEEG it is similar to EEG, except that native file formats (from clinical systems) are often even more closed. In those cases, in practice I often see that people come with data exported from the acquisition system in some intermediary file format that subsequently needs to be converted to something that is BIDS compatible. So iEEG people are more accustomed to doing conversions themselves. It is not that I am specifically happy with this state of affairs, but since everyone appears familiar with the problem, the problem seems to have less practical impact.

In general, MEG/EEG/iEEG acquisition systems are simpler than scanners and there are fewer settings that affect acquisition. I think that with these three modalities it is more the recoding of events into the events.tsv that is challenging for researchers than the MEG/EEG/iEEG timeseries and acquisition details. Although OPM-based MEG will cause specific challenges, and I anticipate that at a certain point an OPM BEP will emerge.

I don't see the direct need for collecting "example native MEG/EEG/iEEG files", although might be biassed as over the years I have collected many examples for FIeldTrip (which I might be able to share if anyone is interested). The examples might be of interest for people that want to implement new converters in other programming environments than we now use (FieldTrip is MATLAB-based, MNE-Python also deals with a good number of formats in Python). Compiled (command line) c-applications and pipelines appear not to be common for MEG/EEG/iEEG.

For fNIRS we settled on the SNIRF format; the situation here might be more comparable to NIFTI. At this moment it is not clear to me how converter tools will develop and whether shared data will help.

CPernet commented 2 years ago

+1 EEG native EDF from commercial software doesn't exist, does it?

I can however relate to this as in PET some information is hidden in private tags that we do not necessarily know how to convert. It is thus not as much a spec issue as a conversion tool issue, on the other hand, if the spec say use nii and we don't have tools like Chris's dcm2niix then the spec is useless.

Maybe a more general point is some BEP are asking for things but do not check where/if available for different data types/format - and that may reflect a difference in who leads which BEP. Back to MEEG, leaders were also software like you who have read and converted native data for years so it was well thought out. As a 1st thing to do, we could update recommendations for BEP mentioning this issue.

bendhouseart commented 3 months ago

@neurolabusc updated this on the new website/BEP documentation with PR 475.