Closed mateuszpawlik closed 7 months ago
All matching works in the same way: attributes and properties from a data source are compared with attributes and properties from bidsmap entries in the following order (see bids.get_matching_run():
unknowndatatypes: list = bidsmap['Options']['bidscoin'].get('unknowntypes',[])
ignoredatatypes: list = bidsmap['Options']['bidscoin'].get('ignoretypes',[])
bidsdatatypes: list = [dtype for dtype in bidsmap.get(datasource.dataformat) if dtype not in unknowndatatypes + ignoredatatypes + ['subject', 'session']]
dataformat = Dataformat(bidsmap.get(datasource.dataformat, {}))
# Loop through all datatypes and runs; all info goes cleanly into run_ (to avoid formatting problem of the CommentedMap)
if 'fmap' in bidsdatatypes:
bidsdatatypes.insert(0, bidsdatatypes.pop(bidsdatatypes.index('fmap'))) # Put fmap at the front (to catch inverted polarity scans first
run_ = create_run(datasource, bidsmap)
for datatype in ignoredatatypes + bidsdatatypes + unknowndatatypes: # The ordered datatypes in which a matching run is searched for
So, as you can see from the for-loop, if you are an getting unknowndatatype for seriesnr 8, than that is (or at least should be) because no match could be found for seriesnr 8 in the func section of your bidsmap (i.e. there is something different about this series).
p.s. another way to exclude specific data sources is to make the series folder hidden by prepending a dot to the foldernames (e.g. 07_func..
-> .07_func
)
I tried to investigate a bit more. Shouldn't all DICOM directories show up in bidsmapper output? The 8 is skipped. Could it be that somehow 8 is treated as 7 and this if statement is false?
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/001-AAHScout_32ch/MR000001
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/002-AAHScout_32ch_MPR/MR000129
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/003-t2_tse_tra_p2/MR000133
INFO | Discovered 'fmap' DICOM sample: dataset/sourcedata/sub-s039/ses-3/004-gefieldmap_martinnew_nw/MR000158
INFO | Discovered 'fmap' DICOM sample: dataset/sourcedata/sub-s039/ses-3/005-gefieldmap_martinnew_nw/MR000254
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/006-fMRI_REST/MR000302
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/007-FMRI_dopa6/MR000552
INFO | Discovered 'anat' DICOM sample: dataset/sourcedata/sub-s039/ses-3/009-MPRAGE_p2/MR001025
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/010-FMRI_dopa2/MR001185
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/011-FMRI_dopa3/MR001309
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/012-FMRI_dopa4/MR001433
I think run-8 was already discovered in another subject (as run-1 or so)? Did bidscoiner also skip run-8? Remember that bidsmapper only show logs of new source datatypes, so it's a shortlist of datatypes, not a longlist of data sources.
In other words, for bidsmapper the output is the same if you have 1 subject or a 100 that are collected with the same protocol
I execute bidscoin in isolation for only one session of one subject :thinking:
Ok, I had another closer look, and I now see there is also 006-fMRI_REST
in the discovery logs. If that scan protocol is the same as your series 8 (008-fMRI_REST
) then all is fine and bidscoin already knows how to convert series 8 (and will do so when you run bidscoiner)
So for bidscoin 006, 007 and 008 are normally the same (so they are discovered only once), except that you now added a run-item for 007 to the exclude datatypes (which is hence also discovered once). But 006 and 008 are still the same datatype, i.e. 008 was already discovered when 006 was encountered
You should also realize that datatype are matched only for non-empty attributes and properties. So 006 and 008 are identical because SeriesNumer
is left empty for the func run-items. Also not that your run-item:
exclude:
- provenance:
attributes:
SeriesNumber: 7
Will exclude all 007 series for all subjects
If you want it to be more specific you should add more filters / non-empty properties and attributes
There are also other fmri sequences later (FMRI_dopa2
, FMRI_dopa3
, FMRI_dopa4
) and they are discovered. Each fmri sequence has a separate rule. But I may need to understand it more.
bidscoiner skips the 8
as unknown:
2024-03-18 10:38:11 - INFO | --> Coining: dataset/sourcedata/sub-s039/ses-3/006-fMRI_REST
2024-03-18 10:38:11 - INFO | Command:
dcm2niix -b y -z n -i n -f "sub-s039_ses-3_task-rest_acq-norm_bold" -o "dataset/sub-s039/ses-3/func" "dataset/sourcedata/sub-s039/ses-3/006-fMRI_REST"
2024-03-18 10:38:12 - INFO | Output:
Chris Rorden's dcm2niiX version v1.0.20230411 GCC12.2.0 x86-64 (64-bit Linux)
Found 250 DICOM file(s)
Convert 250 DICOM as dataset/sub-s039/ses-3/func/sub-s039_ses-3_task-rest_acq-norm_bold (64x64x36x250)
Conversion required 0.470132 seconds (0.412409 for core code).
2024-03-18 10:38:12 - VERBOSE | Adding 'TaskName: rest' to: dataset/sub-s039/ses-3/func/sub-s039_ses-3_task-rest_acq-norm_bold.json
2024-03-18 10:38:12 - INFO | --> Leaving out: dataset/sourcedata/sub-s039/ses-3/007-FMRI_dopa6
2024-03-18 10:38:12 - ERROR | --> Skipping unknown 'extra_data' run: dataset/sourcedata/sub-s039/ses-3/008-FMRI_dopa6/MR000554
-> Re-run the bidsmapper and delete dataset/sub-s039/ses-3 to solve this warning
If they have different SeriesDescription, then they are (rightfully so) discovered as different data types (because SeriesDescription was used as a filter). But your error about the unknown series 8 is very strange indeed
Interesting to see how you are using templates. You are relying heavily on my code to fix all the missing bids-entities, fine (makes life easy I guess). But you also removed a lot of attributes, that do have a function if left empty, namely they discover for instance if two protocols differ in the details (e.g. two scans with a different resolution). Question, do you pass this bidsmap to bidsoiner directly? I guess not, and that you pass it to bidsmapper and the output of bidsmapper, that you pass to bidscoiner, right?
Yes, my use case is particular :see_no_evil: But I like BIDScoin most out of the BIDS conversion tools :grin: For more context, I always convert one session at a time. Therefore, I only need to distinguish different sequences in that one session. I know that BIDScoin is intended to operate on entire datasets and has more features. We delegate the validation of differences between sessions/subjects to BIDS validator.
I pass my template bidsmap to bidsmapper only. It generates then a study bidsmap that is used by bidscoiner. I checked it, and the 8
sequence is not there.
Thanks. This would extract the digit from the protocol name, right? Unfortunately, we must renumber the runs and rename the tasks. These are old data, which require much curation, but we want to use the pipelines for new data with them.
I'm confused, your mapping file doesn't have SeriesNumber: 7
?
No, the 7
is there but the 8
is not.
Thanks. This would extract the digit from the protocol name, right? Unfortunately, we must renumber the runs and rename the tasks. These are old data, which require much curation, but we want to use the pipelines for new data with them.
Yes, but I removed my post because your scheme is more complicated
In a previous issue, someone was dealing with anonymized data who had something similar. And because you removed so many attributes, you could perhaps be dealing with a similar problem. I solved (or at least remedied) the issue in v4.3.0, but you are using 4.2.1
Thanks, it's quite a bit of reading, but I start understanding things better. I'll get to reading it.
In the meantime. Could my problem be the following?
7
matches the exclude pattern.func
rule, I would like the sequence 8
to match, looks at ProtocolName
only.FMRI_dopa6
was already excluded (sequence 7
).8
as already discovered because the func
rule doesn't specify series number.Adding SeriesNumber: 8
to the func
rule fixes the issue, because the excluded sequence had series number 7
and not 8
, so the 8
th is a different sequence.
Yes, a lot of reading, but the relevant part (I updated the link) is where I realize what happens if you underspecify data sources.
You need a matching func item (the protocolName, but without SeriesNumber), next to the SeriesNumber: 7
exclude item
If I understood it right, I added empty SeriesNumber:
to my generic func
item like
func:
- provenance:
attributes: &func_dicomattr
ImageType:
ProtocolName:
SeriesDescription:
SeriesNumber:
bids: &func_bidsattr
suffix: bold
run: <<>>
and this worked.
And the fix, you linked to, makes this implicit?
No, just leave SeriesNumber out altogether (you don't want to distinguish between them) for func items
func:
- provenance:
attributes: &func_dicomattr
ImageType:
ProtocolName:
SeriesDescription:
bids: &func_bidsattr
suffix: bold
run: <<>>
The fix was for items that no values for any of the attributes
Back to beginning then :thinking: If I exclude sequence 7
with SeriesNumber
, sequence 8
is skipped as unknown https://github.com/Donders-Institute/bidscoin/issues/229#issuecomment-2003891065
This was for me unexpected and I'm trying to understand why it's happening. I thought I did understand but now I am confused.
No, just leave SeriesNumber out altogether (you don't want to distinguish between them) for func items
Ah, yes, I get that. This I don't want. It's a tough one.
Making your own template is indeed more advanced usage as it requires a good understanding of how run-item matching works. But how you describe it, it could possibly be a bug. Are you using a stable release version or pulled that commit you showed above? I can have a look if you could share your dataset (just the headers will do)
I haven't tried a newer version yet. I will do that. I will need some time though. And I can prepare a minimal example.
This is a very special case for me too. It shouldn't actually happen, because run: <<>>
handles it nicely. However, the resulting nifti file was corrupted too much, and the rest of the pipeline failed. Therefore, I tried to exclude it.
That would be great, because I'd like to know if you encountered a bug or if this issue is only about creating a proper template bidsmap
I managed to continue the investigation. I used my DICOM generator to prepare a minimal example.
I worked with master branch pip install --force-reinstall git+https://github.com/Donders-Institute/bidscoin
, should be https://github.com/Donders-Institute/bidscoin/commit/2aceaa4ebd4a2894c875f0e3944ff1b3b7f71a20 I commented out the problematic code from #230
Clone https://gitlab.com/ccns/neurocog/neurodataops/anc/software/dicom-generator
cd dicom-generator
Save the following json as many_func.json
[
{
"ImageType" : "fmri",
"SeriesDescription" : "func_task-rest"
},
{
"ImageType" : "fmri",
"SeriesDescription" : "func_task-rest"
},
{
"ImageType" : "fmri",
"SeriesDescription" : "func_task-rest"
}
]
python3 -m venv .venv
source .venv/bin/activate
pip3 install -U -r requirements.txt
python3 generate_dicom.py many_func.json dataset/sourcedata/sub-001/ses-1 -t 'MR00000{id}'
Sort the DICOMs using dicomsort
(from BIDScoin)
dicomsort dataset/sourcedata --subprefix sub --sesprefix ses --pattern '.*\/[A-Z]{2}\d{6}'
The following bidsmap reproduces my problem. Sequence with SeriesNumber=1
is excluded, consecutive ones are identified as unknown.
Options:
bidscoin:
version: 4.3.1
bidsignore: [extra_data/]
subprefix: sub-
sesprefix: ses-
unknowntypes: [extra_data] # A list of datatypes that are converted to BIDS-like datatype folders
ignoretypes: [exclude] # A list of datatypes that are excluded / not converted to BIDS
unzip:
plugins:
dcm2niix2bids:
command: dcm2niix
args: -b y -z n -i n
anon: n # Not to shift acquisition dates.
meta: [.json, .tsv, .tsv.gz]
DICOM:
subject: <<filepath:/sub-(.*?)/>> #<<SourceFilePath>>
session: <<filepath:/sub-.*?/ses-(.*?)/>> #<<SourceFilePath>>
func:
- provenance:
attributes: &func_dicomattr
ImageType:
ProtocolName:
SeriesDescription:
bids: &func_bidsattr
suffix: bold
run: <<>>
- provenance:
attributes:
<<: *func_dicomattr
ProtocolName: (?i)^func.*
bids:
<<: *func_bidsattr
task: rest
meta:
TaskName: rest
exclude:
- provenance:
attributes: &exclude_dicomattr
ProtocolName:
SeriesDescription:
ImageType:
bids: &exclude_bidsattr
suffix: <ProtocolName>
- provenance:
attributes:
<<: *exclude_dicomattr
SeriesNumber: 1
bids:
<<: *exclude_bidsattr
Substituting the exclude
section with the following fixes the issue.
exclude:
- provenance:
attributes:
SeriesNumber: 1
bids:
suffix: <ProtocolName>
Let's use a simplified bidsmap, without the yaml anchors. I copy here only the func
and exclude
sections. The beginning is the same. This bidsmap reproduces the problem.
func:
- provenance:
attributes:
ImageType:
SeriesDescription:
ProtocolName: (?i)^func.*
bids:
suffix: bold
run: <<>>
task: rest
meta:
TaskName: rest
exclude:
- provenance:
attributes:
ImageType:
SeriesDescription:
ProtocolName:
SeriesNumber: 1
bids:
suffix: <ProtocolName>
Removing ImageType
from attributes
in exclude
solves the issue. Moreover, removing any other attribute, doesn't solve the problem.
This bidsmap also reproduces the problem.
func:
- provenance:
attributes:
ProtocolName: (?i)^func.*
bids:
suffix: bold
run: <<>>
task: rest
meta:
TaskName: rest
exclude:
- provenance:
attributes:
ImageType:
SeriesNumber: 1
bids:
suffix: <ProtocolName>
I can only narrow it down to ImageType
attribute. It would help me to understand better the purpose of the empty attributes (without a value). I thought I can use them with an anchor to reuse in run items, and overwrite when needed. And that I need them to differentiate different types of sequences. But it seems to me that they do more than that.
I will do a test my self, but first to address your question
I can only narrow it down to ImageType attribute. It would help me to understand better the purpose of the empty attributes (without a value). I thought I can use them with an anchor to reuse in run items, and overwrite when needed. And that I need them to differentiate different types of sequences. But it seems to me that they do more than that.
The way run-item matching works is as follows. All properties and attributes in the template bidsmap are compared to the attributes and properties of the data source. The properties and attributes can contain regular expressions, or can even be empty, acting as a broad filter or even as no filter at all. However, and that is the somewhat hidden power of bidscoin, when a match is found, it is stored in the study bidsmap with all template attribute values replaced by the data source attribute values. In this way, the run-item becomes a very narrow filter for the discovered data source. You should also know that run-matching is always performed first in the study template, and then in the template bidsmap. In this way, the study bidsmap becomes a precise reflection of the dataset (i.e. a collection of discovered datatypes). And the more attributes you add, the more precise it becomes. I hope this helps to understand why adding empty attributes in the template bidsmap will not change the template run-item matching itself (because empty attributes are ignored when doing template matching), but it will make the study bidsmap more specific/selective. (I do all this because users can then freely edit the study bidsmap to their liking, but that is something you don't use, so this procedure may perhaps seem overcomplicated to you. Another way of looking at this is this that the template bidsmap contains the prior knowledge, and that the study bidsmap contains the user/posterior knowledge)
As a simple example, suppose you have an extensive acquisition protocol with a high-resolution + a low-resolution T1 scan with the same ProtcoolName. If you only add {ProtocolName: .*T1w.*}
to the template, then your study bidsmap will contain one T1 run-item, and bidscoiner will produce 2 T1 images (run-1
and run-2
). If you add an empty SliceThickness attribute, i.e. {ProtocolName: .*T1w.*, SliceThickness: }
then you will get two run-items in the study bidsmap (with different SliceThickness values) and bidscoiner will produce two different run-1 T1 images (if you want so). So rather than having to specify every kind of run-item in the template bidsmap, you can broadly specify them, and still discriminate between subtle protocol variations in your acquisition -- which is what you want (although to be useful you need to add the posterior knowledge, i.e. use the bidseditor).
Oh yeah, and to add to that, in the template bidsmap I often add an entirely empty run-item as a first item, which I then dereference in the following run-items. That is for convenience and innocent (because it never matches anything), but it is also useful for the bidseditor. Because in the bidseditor, when a user changes e.g. the datatype, then the first run-item of that new datatype is taken. In this way, I can control the default suffix that is then taken
Ok, this was a rare case that I fixed by making my bidsmap duplicate entry check more specific. Btw, there is an error in your test case above, namely that instead of using ProtocolName
:
- provenance:
attributes:
<<: *func_dicomattr
ProtocolName: (?i)^func.*
You should have used SeriesDescription
(as this was used by your DICOM generator)
Are you making the transition to v4.3.1 now? If so, and if your done with testing and it all works, then please let me know and I'll release 4.3.1 on pypi
Hi @marcelzwiers, I was busy with training this week. I'm trying to get back to BIDScoin today. There are three issues on my testing list. I will start with this one and let you know.
Btw, there is an error in your test case above, namely that instead of using
ProtocolName
: ... You should have usedSeriesDescription
(as this was used by your DICOM generator)
Thanks. That would explain many things :see_no_evil: But I can swear I checked it and both of the fields were there. This could be the reason of the confusion too.
The generated study map was also wrong then. It has empty strings for ProtocolName
, and this was the only attribute to match. Then of course the sequences didn't match anything and were categorized as unknown.
P.S. I know where I've seen both of the fields - in the sidecar json files. dcm2niix must add ProtocolName
there.
The version I started with (pip install --force-reinstall git+https://github.com/Donders-Institute/bidscoin@2aceaa4ebd4a2894c875f0e3944ff1b3b7f71a20
):
ProtocolName
to SeriesDescription
) - things work as expectedBIDScoin version pip install --force-reinstall git+https://github.com/Donders-Institute/bidscoin
(should fetch the newest commit on the master branch):
ProtocolName
to SeriesDescription
) - things work as expectedThanks a lot! I like your explanation.
Is there anything I can provide for this issue? I don't know how to trigger the debugging comments you added in your commit.
No, if your happy then you can close this issue :-)
p.s. If you run bidscoin -h
(or just bidscoin
), you will see how you can activate the debugging mode
The debug logs look good. Thanks. Closing as resolved.
Mhhh, my commit (df7032f6e3a43fafa727a5a9d1e2142c6f50ad34) solved your issue but creates false positives for bidsmaps in which users have (legitimately) changed the data type. I will have to revert/fix the fix :-(
Bummer, the more I think about it, the more I get convinced you have run to a fundamental limit of BIDScoin's design. Basically what you are doing when excluding a particular SeriesNumber is adding SeriesNumber
to your bidsmap exclude section, but you don't add (rightfully so) SeriesNumber
to the func section of your bidsmap. Before my fix, when doing data source discovery, if a non-excluded is already added to your bidsmap, then your excluded run was also matching it (because the func item doesn't contain SeriesNumber
) and BIDScoin hence did not add your exclude item as a new item to the bidsmap (i.e. the reason you posted this issue). My fix works fine if you don't change datatype after running the bidsmapper (i.e. with the bidseditor), but it leads to double entries otherwise. This is really not desirable, and I don't see a way around it, so I'm reverting my commit for now. If I find a new solution I'll let you know, but until then you have to find another way of excluding your Series
One way of solving your problem (in 4.3.2) is to include a regular expression for your func item, in which you require that SeriesNumber is unequal to 7, i.e. the opposite of your exclude item
If that works, then this issue can be closed
I will read carefully your comments next week. I appreciate a lot all your help :blush: But, since my use case is different than the default, I am happy with workarounds too :slightly_smiling_face:
I'm testing 14c74ade8fbdd0e862b7e01614e41afe0c6a719a from master branch. The regex doesn't seem to work. I've tried SeriesNumber: ^[0-68-9]
, which should much everything not starting with 7
. Despite no match found, both sequences are converted. And both sequences are in the study template.
Logs of bidsmapper for my original two problematic sequences (I deleted path prefixes for readability):
BCDEBUG - bidscoin.bids | Found no bidsmap match for: dataset/sourcedata/sub-s056/ses-1/007-FMRI_dopa6/MR000395
BCDEBUG - bidscoin.plugin.dcm2niix2bids | No match found in the study bidsmap, now trying the template bidsmap
BCDEBUG - bidscoin.bids | Found no bidsmap match for: dataset/sourcedata/sub-s056/ses-1/007-FMRI_dopa6/MR000395
INFO - bidscoin.plugin.dcm2niix2bids | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s056/ses-1/007-FMRI_dopa6/MR000395
BCDEBUG - bidscoin.bids | Found no bidsmap match for: dataset/sourcedata/sub-s056/ses-1/008-FMRI_dopa6/MR000401
BCDEBUG - bidscoin.plugin.dcm2niix2bids | No match found in the study bidsmap, now trying the template bidsmap
BCDEBUG - bidscoin.bids | Found bidsmap match: sub--unknown/ses--unknown/DICOM_func_id010 -> dataset/sourcedata/sub-s056/ses-1/008-FMRI_dopa6/MR000401
INFO - bidscoin.plugin.dcm2niix2bids | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s056/ses-1/008-FMRI_dopa6/MR000401
With test data it also doesn't work - all sequences are converted. There are three functional sequences with SeriesNumber
from 1
to 3
. The rule I used for test data (which should not match the second sequence):
func:
- provenance:
attributes:
SeriesDescription: ^func.*
SeriesNumber: ^[13]
bids:
suffix: bold
run: <<>>
task: rest
meta:
TaskName: rest
I understood that I should not have the sequence to exclude in the exclude section anymore. Did I misunderstand?
Let me know if I can do more.
You have to add a matching Series 7 in your exclude section, and a non-matching Series 7 (^[0-68-9]
) in your func section. Otherwise your Series 7 data source will have two or zero matches in your bidsmap, and then things go wrong
Think of it like this. If bidscoin comes across a data source it looks in the bidsmap for a mapping to a BIDS filename (and metadata). So if the data source doesn't match anything you get an error/warning, if you have two or more matches, bidscoin faces an ambiguity it cannot resolve (so it takes the first one it finds). To make building bidsmaps a little easier, I've made the search order such that exclude items are searched first, and extra_data last (you can sometimes make use of this if you understand how it works)
You have to add a matching Series 7 in your exclude section, and a non-matching Series 7 (
^[0-68-9]
) in your func section. Otherwise your Series 7 data source will have two or zero matches in your bidsmap, and then things go wrong
So I did misunderstand. I added the matching pattern to the exclude section and it works.
Example for test data:
func:
- provenance:
attributes:
SeriesDescription: ^func.*
SeriesNumber: ^[13]
bids:
suffix: bold
run: <<>>
task: rest
meta:
TaskName: rest
exclude:
- provenance:
attributes:
SeriesNumber: 2
Think of it like this. If bidscoin comes across a data source it looks in the bidsmap for a mapping to a BIDS filename (and metadata). So if the data source doesn't match anything you get an error/warning, if you have two or more matches, bidscoin faces an ambiguity it cannot resolve (so it takes the first one it finds). To make building bidsmaps a little easier, I've made the search order such that exclude items are searched first, and extra_data last (you can sometimes make use of this if you understand how it works)
This I understand.
However, I was confused by the logs. Although there was no match for 007-FMRI_dopa6
, and I expected a warning, it was discovered as a func sequence.
I tried to exclude a specific sequence, one of two duplicates with
ProtocolName: FMRI_dopa6
, using:It's only temporary for one subject.
In the
func
section I have:BIDScoin excluded sequence with
SeriesNumber: 7
but the one withSeriesNumber: 8
ended up inextra_data
.These could be relevant log entries:
Adding
SeriesNumber: 8
in thefunc
section above solved the problem.I am wandering how excluding really works, and if the described behavior is intended.