Exclude sections behaves weirdly

mateuszpawlik commented 8 months ago

I tried to exclude a specific sequence, one of two duplicates with ProtocolName: FMRI_dopa6, using:

exclude:
  - provenance:
    attributes:
      SeriesNumber: 7

It's only temporary for one subject.

In the func section I have:

  - provenance:
    attributes:
      ProtocolName: (?i).*FMRI_dopa6*

BIDScoin excluded sequence with SeriesNumber: 7 but the one with SeriesNumber: 8 ended up in extra_data.

These could be relevant log entries:

...
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/007-FMRI_dopa6/MR000552
...
INFO | False:   exclude/sub-DICOM_FMRIdopa6.*
...

Adding SeriesNumber: 8 in the func section above solved the problem.

I am wandering how excluding really works, and if the described behavior is intended.

marcelzwiers commented 8 months ago

All matching works in the same way: attributes and properties from a data source are compared with attributes and properties from bidsmap entries in the following order (see bids.get_matching_run():

    unknowndatatypes: list = bidsmap['Options']['bidscoin'].get('unknowntypes',[])
    ignoredatatypes:  list = bidsmap['Options']['bidscoin'].get('ignoretypes',[])
    bidsdatatypes:    list = [dtype for dtype in bidsmap.get(datasource.dataformat) if dtype not in unknowndatatypes + ignoredatatypes + ['subject', 'session']]
    dataformat             = Dataformat(bidsmap.get(datasource.dataformat, {}))

    # Loop through all datatypes and runs; all info goes cleanly into run_ (to avoid formatting problem of the CommentedMap)
    if 'fmap' in bidsdatatypes:
        bidsdatatypes.insert(0, bidsdatatypes.pop(bidsdatatypes.index('fmap'))) # Put fmap at the front (to catch inverted polarity scans first
    run_ = create_run(datasource, bidsmap)
    for datatype in ignoredatatypes + bidsdatatypes + unknowndatatypes:         # The ordered datatypes in which a matching run is searched for

So, as you can see from the for-loop, if you are an getting unknowndatatype for seriesnr 8, than that is (or at least should be) because no match could be found for seriesnr 8 in the func section of your bidsmap (i.e. there is something different about this series).

p.s. another way to exclude specific data sources is to make the series folder hidden by prepending a dot to the foldernames (e.g. 07_func.. -> .07_func)

mateuszpawlik commented 8 months ago

I tried to investigate a bit more. Shouldn't all DICOM directories show up in bidsmapper output? The 8 is skipped. Could it be that somehow 8 is treated as 7 and this if statement is false?

INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/001-AAHScout_32ch/MR000001
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/002-AAHScout_32ch_MPR/MR000129
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/003-t2_tse_tra_p2/MR000133
INFO | Discovered 'fmap' DICOM sample: dataset/sourcedata/sub-s039/ses-3/004-gefieldmap_martinnew_nw/MR000158
INFO | Discovered 'fmap' DICOM sample: dataset/sourcedata/sub-s039/ses-3/005-gefieldmap_martinnew_nw/MR000254
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/006-fMRI_REST/MR000302
INFO | Discovered 'exclude' DICOM sample: dataset/sourcedata/sub-s039/ses-3/007-FMRI_dopa6/MR000552
INFO | Discovered 'anat' DICOM sample: dataset/sourcedata/sub-s039/ses-3/009-MPRAGE_p2/MR001025
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/010-FMRI_dopa2/MR001185
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/011-FMRI_dopa3/MR001309
INFO | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s039/ses-3/012-FMRI_dopa4/MR001433

marcelzwiers commented 8 months ago

I think run-8 was already discovered in another subject (as run-1 or so)? Did bidscoiner also skip run-8? Remember that bidsmapper only show logs of new source datatypes, so it's a shortlist of datatypes, not a longlist of data sources.

marcelzwiers commented 8 months ago

In other words, for bidsmapper the output is the same if you have 1 subject or a 100 that are collected with the same protocol

mateuszpawlik commented 8 months ago

I execute bidscoin in isolation for only one session of one subject :thinking:

marcelzwiers commented 8 months ago

Ok, I had another closer look, and I now see there is also 006-fMRI_REST in the discovery logs. If that scan protocol is the same as your series 8 (008-fMRI_REST) then all is fine and bidscoin already knows how to convert series 8 (and will do so when you run bidscoiner)

marcelzwiers commented 8 months ago

So for bidscoin 006, 007 and 008 are normally the same (so they are discovered only once), except that you now added a run-item for 007 to the exclude datatypes (which is hence also discovered once). But 006 and 008 are still the same datatype, i.e. 008 was already discovered when 006 was encountered

marcelzwiers commented 8 months ago

You should also realize that datatype are matched only for non-empty attributes and properties. So 006 and 008 are identical because SeriesNumer is left empty for the func run-items. Also not that your run-item:

exclude:
  - provenance:
    attributes:
      SeriesNumber: 7

Will exclude all 007 series for all subjects

marcelzwiers commented 8 months ago

If you want it to be more specific you should add more filters / non-empty properties and attributes

mateuszpawlik commented 8 months ago

There are also other fmri sequences later (FMRI_dopa2, FMRI_dopa3, FMRI_dopa4) and they are discovered. Each fmri sequence has a separate rule. But I may need to understand it more.

mapping file

``` Options: bidscoin: version: 4.2.1 bidsignore: [extra_data/] subprefix: sub- sesprefix: ses- unknowntypes: [extra_data] # A list of datatypes that are converted to BIDS-like datatype folders ignoretypes: [exclude] # A list of datatypes that are excluded / not converted to BIDS plugins: dcm2niix2bids: command: dcm2niix args: -b y -z n -i n anon: n # Not to shift acquisition dates. meta: [.json, .tsv, .tsv.gz] DICOM: subject: <> #<> session: <> #<> anat: - provenance: attributes: ProtocolName: (?i).*mprage.* bids: suffix: T1w func: - provenance: attributes: &func_dicomattr ImageType: ProtocolName: SeriesDescription: bids: &func_bidsattr suffix: bold run: <<>> - provenance: #rest normalized attributes: <<: *func_dicomattr ProtocolName: (?i).*fMRI_REST.* ImageType: .*'NORM'.* bids: <<: *func_bidsattr task: rest acq: norm meta: TaskName: rest - provenance: #rest attributes: <<: *func_dicomattr ProtocolName: (?i).*fMRI_REST.* bids: <<: *func_bidsattr task: rest meta: TaskName: rest - provenance: #SST run1 normalized attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa2.* ImageType: .*'NORM'.* bids: <<: *func_bidsattr task: sst run: '1' acq: norm meta: TaskName: sst - provenance: #SST run1 attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa2.* bids: <<: *func_bidsattr task: sst run: '1' meta: TaskName: sst - provenance: #SST run2 normalized attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa3.* ImageType: .*'NORM'.* bids: <<: *func_bidsattr task: sst run: '2' acq: norm meta: TaskName: sst - provenance: #SST run2 attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa3.* bids: <<: *func_bidsattr task: sst run: '2' meta: TaskName: sst - provenance: #SST run3 normalized attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa4.* ImageType: .*'NORM'.* bids: <<: *func_bidsattr task: sst run: '3' acq: norm meta: TaskName: sst - provenance: #SST run3 attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa4.* bids: <<: *func_bidsattr task: sst run: '3' meta: TaskName: sst - provenance: #nback normalized attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa6* ImageType: .*'NORM'.* bids: <<: *func_bidsattr task: nback acq: norm meta: TaskName: nback - provenance: #nback attributes: <<: *func_dicomattr ProtocolName: (?i).*FMRI_dopa6* bids: <<: *func_bidsattr task: nback meta: TaskName: nback fmap: - provenance: attributes: &fmap_dicomattr ImageType: ProtocolName: SeriesDescription: EchoNumbers: bids: # Dummy if not referenced. suffix: magnitude1 meta: &fmap_metaattr IntendedFor: <> - provenance: attributes: <<: *fmap_dicomattr ProtocolName: (?i).*fieldmap.* ImageType: .*'M'.* bids: suffix: magnitude2 # Due to magnitude1 and magnitude2 images in the same series (Siemens). dcm2niix will deal with it correctly. meta: <<: *fmap_metaattr - provenance: attributes: <<: *fmap_dicomattr ProtocolName: (?i).*fieldmap.* ImageType: '[''ORIGINAL'', ''PRIMARY'', ''P'', ''ND'']' bids: suffix: phasediff meta: <<: *fmap_metaattr exclude: - provenance: attributes: &exclude_dicomattr ProtocolName: SeriesDescription: ImageType: bids: &exclude_bidsattr suffix: - provenance: attributes: <<: *exclude_dicomattr ProtocolName: (?i)localizer bids: <<: *exclude_bidsattr - provenance: attributes: <<: *exclude_dicomattr ProtocolName: (?i).*dopa1.* bids: <<: *exclude_bidsattr - provenance: attributes: <<: *exclude_dicomattr ProtocolName: (?i).*scout.* bids: <<: *exclude_bidsattr - provenance: attributes: <<: *exclude_dicomattr ProtocolName: (?i).*phoenix.* bids: <<: *exclude_bidsattr - provenance: attributes: <<: *exclude_dicomattr ProtocolName: (?i).*t2.* bids: <<: *exclude_bidsattr ```

bidscoiner skips the 8 as unknown:

2024-03-18 10:38:11 - INFO | --> Coining: dataset/sourcedata/sub-s039/ses-3/006-fMRI_REST
2024-03-18 10:38:11 - INFO | Command:
dcm2niix -b y -z n -i n -f "sub-s039_ses-3_task-rest_acq-norm_bold" -o "dataset/sub-s039/ses-3/func" "dataset/sourcedata/sub-s039/ses-3/006-fMRI_REST"
2024-03-18 10:38:12 - INFO | Output:
Chris Rorden's dcm2niiX version v1.0.20230411  GCC12.2.0 x86-64 (64-bit Linux)
Found 250 DICOM file(s)
Convert 250 DICOM as dataset/sub-s039/ses-3/func/sub-s039_ses-3_task-rest_acq-norm_bold (64x64x36x250)
Conversion required 0.470132 seconds (0.412409 for core code).

2024-03-18 10:38:12 - VERBOSE | Adding 'TaskName: rest' to: dataset/sub-s039/ses-3/func/sub-s039_ses-3_task-rest_acq-norm_bold.json
2024-03-18 10:38:12 - INFO | --> Leaving out: dataset/sourcedata/sub-s039/ses-3/007-FMRI_dopa6
2024-03-18 10:38:12 - ERROR | --> Skipping unknown 'extra_data' run: dataset/sourcedata/sub-s039/ses-3/008-FMRI_dopa6/MR000554
-> Re-run the bidsmapper and delete dataset/sub-s039/ses-3 to solve this warning

marcelzwiers commented 8 months ago

If they have different SeriesDescription, then they are (rightfully so) discovered as different data types (because SeriesDescription was used as a filter). But your error about the unknown series 8 is very strange indeed

marcelzwiers commented 8 months ago

Interesting to see how you are using templates. You are relying heavily on my code to fix all the missing bids-entities, fine (makes life easy I guess). But you also removed a lot of attributes, that do have a function if left empty, namely they discover for instance if two protocols differ in the details (e.g. two scans with a different resolution). Question, do you pass this bidsmap to bidsoiner directly? I guess not, and that you pass it to bidsmapper and the output of bidsmapper, that you pass to bidscoiner, right?

mateuszpawlik commented 8 months ago

Yes, my use case is particular :see_no_evil: But I like BIDScoin most out of the BIDS conversion tools :grin: For more context, I always convert one session at a time. Therefore, I only need to distinguish different sequences in that one session. I know that BIDScoin is intended to operate on entire datasets and has more features. We delegate the validation of differences between sessions/subjects to BIDS validator.

I pass my template bidsmap to bidsmapper only. It generates then a study bidsmap that is used by bidscoiner. I checked it, and the 8 sequence is not there.

mateuszpawlik commented 8 months ago

Thanks. This would extract the digit from the protocol name, right? Unfortunately, we must renumber the runs and rename the tasks. These are old data, which require much curation, but we want to use the pipelines for new data with them.

marcelzwiers commented 8 months ago

I'm confused, your mapping file doesn't have SeriesNumber: 7?

mateuszpawlik commented 8 months ago

No, the 7 is there but the 8 is not.

study bindsmap

``` Options: bidscoin: version: 4.2.1 bidsignore: [extra_data/] subprefix: sub- sesprefix: ses- unknowntypes: [extra_data] # A list of datatypes that are converted to BIDS-like datatype folders ignoretypes: [exclude] # A list of datatypes that are excluded / not converted to BIDS plugins: dcm2niix2bids: command: dcm2niix args: -b y -z n -i n anon: n # Not to shift acquisition dates. meta: [.json, .tsv, .tsv.gz] DICOM: subject: <> #<> session: <> #<> anat: - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/009-MPRAGE_p2/MR001025 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: MPRAGE_p2 bids: suffix: T1w acq: '' ce: '' rec: '' run: '' part: '' meta: {} func: - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/006-fMRI_REST/MR000302 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: fMRI_REST ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']" SeriesDescription: fMRI_REST bids: task: rest acq: norm suffix: bold run: <<>> ce: '' rec: '' dir: '' echo: '' part: '' meta: TaskName: rest - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/010-FMRI_dopa2/MR001185 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: FMRI_dopa2 ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']" SeriesDescription: FMRI_dopa2 bids: task: sst run: '1' acq: norm suffix: bold ce: '' rec: '' dir: '' echo: '' part: '' meta: TaskName: sst - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/011-FMRI_dopa3/MR001309 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: FMRI_dopa3 ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']" SeriesDescription: FMRI_dopa3 bids: task: sst run: '2' acq: norm suffix: bold ce: '' rec: '' dir: '' echo: '' part: '' meta: TaskName: sst - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/012-FMRI_dopa4/MR001433 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: FMRI_dopa4 ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']" SeriesDescription: FMRI_dopa4 bids: task: sst run: '3' acq: norm suffix: bold ce: '' rec: '' dir: '' echo: '' part: '' meta: TaskName: sst fmap: - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/004-gefieldmap_martinnew_nw/MR000158 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: gefieldmap_martinnew_nw ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM']" SeriesDescription: gefieldmap_martinnew_nw EchoNumbers: '2' bids: suffix: magnitude2 acq: '' run: '' meta: IntendedFor: <> - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/005-gefieldmap_martinnew_nw/MR000254 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: gefieldmap_martinnew_nw ImageType: "['ORIGINAL', 'PRIMARY', 'P', 'ND']" SeriesDescription: gefieldmap_martinnew_nw EchoNumbers: '2' bids: suffix: phasediff acq: '' run: '' meta: IntendedFor: <> exclude: - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/001-AAHScout_32ch/MR000001 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: AAHScout_32ch SeriesDescription: AAHScout_32ch ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM']" bids: suffix: AAHScout32ch meta: {} - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/002-AAHScout_32ch_MPR/MR000129 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: AAHScout_32ch SeriesDescription: AAHScout_32ch_MPR ImageType: "['ORIGINAL', 'PRIMARY', 'MPR', 'ND', 'NORM']" bids: suffix: AAHScout32ch meta: {} - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/003-t2_tse_tra_p2/MR000133 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: ProtocolName: t2_tse_tra_p2 SeriesDescription: t2_tse_tra_p2 ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM']" bids: suffix: t2tsetrap2 meta: {} - provenance: /home/mpawlik/Remote/anc/bids-conversion/work/3c/5da5a04fb384ab88e439a826725608/dataset/sourcedata/sub-s039/ses-3/007-FMRI_dopa6/MR000552 properties: filepath: '' filename: '' filesize: '' nrfiles: '' attributes: SeriesNumber: '7' ProtocolName: FMRI_dopa6 SeriesDescription: FMRI_dopa6 ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC']" bids: suffix: FMRIdopa6 meta: {} ```

marcelzwiers commented 8 months ago

Thanks. This would extract the digit from the protocol name, right? Unfortunately, we must renumber the runs and rename the tasks. These are old data, which require much curation, but we want to use the pipelines for new data with them.

Yes, but I removed my post because your scheme is more complicated

marcelzwiers commented 8 months ago

In a previous issue, someone was dealing with anonymized data who had something similar. And because you removed so many attributes, you could perhaps be dealing with a similar problem. I solved (or at least remedied) the issue in v4.3.0, but you are using 4.2.1

marcelzwiers commented 8 months ago

https://github.com/Donders-Institute/bidscoin/issues/221#issuecomment-1919207617

marcelzwiers commented 8 months ago

https://github.com/Donders-Institute/bidscoin/commit/85eb1157c550a868034c355225362051f1310878

mateuszpawlik commented 8 months ago

Thanks, it's quite a bit of reading, but I start understanding things better. I'll get to reading it.

In the meantime. Could my problem be the following?

Sequence 7 matches the exclude pattern.
func rule, I would like the sequence 8 to match, looks at ProtocolName only.
A sequence with protocol name FMRI_dopa6 was already excluded (sequence 7).
BIDScoin treats sequence 8 as already discovered because the func rule doesn't specify series number.

Adding SeriesNumber: 8 to the func rule fixes the issue, because the excluded sequence had series number 7 and not 8, so the 8th is a different sequence.

marcelzwiers commented 8 months ago

Yes, a lot of reading, but the relevant part (I updated the link) is where I realize what happens if you underspecify data sources.

You need a matching func item (the protocolName, but without SeriesNumber), next to the SeriesNumber: 7 exclude item

mateuszpawlik commented 8 months ago

If I understood it right, I added empty SeriesNumber: to my generic func item like

func:
  - provenance:
    attributes: &func_dicomattr
      ImageType:
      ProtocolName:
      SeriesDescription:
      SeriesNumber:
    bids: &func_bidsattr
      suffix: bold
      run: <<>>

and this worked.

And the fix, you linked to, makes this implicit?

marcelzwiers commented 8 months ago

No, just leave SeriesNumber out altogether (you don't want to distinguish between them) for func items

func:
  - provenance:
    attributes: &func_dicomattr
      ImageType:
      ProtocolName:
      SeriesDescription:
bids: &func_bidsattr
      suffix: bold
      run: <<>>

marcelzwiers commented 8 months ago

The fix was for items that no values for any of the attributes

mateuszpawlik commented 8 months ago

Back to beginning then :thinking: If I exclude sequence 7 with SeriesNumber, sequence 8 is skipped as unknown https://github.com/Donders-Institute/bidscoin/issues/229#issuecomment-2003891065

This was for me unexpected and I'm trying to understand why it's happening. I thought I did understand but now I am confused.

mateuszpawlik commented 8 months ago

No, just leave SeriesNumber out altogether (you don't want to distinguish between them) for func items

Ah, yes, I get that. This I don't want. It's a tough one.

marcelzwiers commented 8 months ago

Making your own template is indeed more advanced usage as it requires a good understanding of how run-item matching works. But how you describe it, it could possibly be a bug. Are you using a stable release version or pulled that commit you showed above? I can have a look if you could share your dataset (just the headers will do)

mateuszpawlik commented 8 months ago

I haven't tried a newer version yet. I will do that. I will need some time though. And I can prepare a minimal example.

This is a very special case for me too. It shouldn't actually happen, because run: <<>> handles it nicely. However, the resulting nifti file was corrupted too much, and the rest of the pipeline failed. Therefore, I tried to exclude it.

marcelzwiers commented 8 months ago

That would be great, because I'd like to know if you encountered a bug or if this issue is only about creating a proper template bidsmap

mateuszpawlik commented 8 months ago

I managed to continue the investigation. I used my DICOM generator to prepare a minimal example.

I worked with master branch pip install --force-reinstall git+https://github.com/Donders-Institute/bidscoin, should be https://github.com/Donders-Institute/bidscoin/commit/2aceaa4ebd4a2894c875f0e3944ff1b3b7f71a20 I commented out the problematic code from #230

Clone https://gitlab.com/ccns/neurocog/neurodataops/anc/software/dicom-generator

cd dicom-generator

Save the following json as many_func.json

[
    {
        "ImageType" : "fmri",
        "SeriesDescription" : "func_task-rest"
    },
    {
        "ImageType" : "fmri",
        "SeriesDescription" : "func_task-rest"
    },
    {
        "ImageType" : "fmri",
        "SeriesDescription" : "func_task-rest"
    }
]

python3 -m venv .venv
source .venv/bin/activate
pip3 install -U -r requirements.txt
python3 generate_dicom.py many_func.json dataset/sourcedata/sub-001/ses-1 -t 'MR00000{id}'

Sort the DICOMs using dicomsort (from BIDScoin)

dicomsort dataset/sourcedata --subprefix sub --sesprefix ses --pattern '.*\/[A-Z]{2}\d{6}'

The following bidsmap reproduces my problem. Sequence with SeriesNumber=1 is excluded, consecutive ones are identified as unknown.

Options: 

  bidscoin:
    version: 4.3.1
    bidsignore: [extra_data/]
    subprefix: sub-
    sesprefix: ses-
    unknowntypes: [extra_data]    # A list of datatypes that are converted to BIDS-like datatype folders
    ignoretypes: [exclude]        # A list of datatypes that are excluded / not converted to BIDS
    unzip:
  plugins:
    dcm2niix2bids: 
      command: dcm2niix
      args: -b y -z n -i n
      anon: n # Not to shift acquisition dates.
      meta: [.json, .tsv, .tsv.gz]

DICOM:

  subject: <<filepath:/sub-(.*?)/>> #<<SourceFilePath>>
  session: <<filepath:/sub-.*?/ses-(.*?)/>> #<<SourceFilePath>>

  func:
  - provenance:
    attributes: &func_dicomattr
      ImageType:
      ProtocolName:
      SeriesDescription:
    bids: &func_bidsattr
      suffix: bold
      run: <<>>
  - provenance:
    attributes:
      <<: *func_dicomattr
      ProtocolName: (?i)^func.*
    bids:
      <<: *func_bidsattr
      task: rest
    meta:
      TaskName: rest

  exclude:
  - provenance:
    attributes: &exclude_dicomattr 
      ProtocolName:
      SeriesDescription:
      ImageType:
    bids: &exclude_bidsattr
      suffix: <ProtocolName>
  - provenance:
    attributes:
      <<: *exclude_dicomattr
      SeriesNumber: 1
    bids:
      <<: *exclude_bidsattr

Substituting the exclude section with the following fixes the issue.

  exclude:
  - provenance:
    attributes:
      SeriesNumber: 1
    bids:
      suffix: <ProtocolName>

Let's use a simplified bidsmap, without the yaml anchors. I copy here only the func and exclude sections. The beginning is the same. This bidsmap reproduces the problem.

  func:
  - provenance:
    attributes:
      ImageType:
      SeriesDescription:
      ProtocolName: (?i)^func.*
    bids:
      suffix: bold
      run: <<>>
      task: rest
    meta:
      TaskName: rest

  exclude:
  - provenance:
    attributes:
      ImageType:
      SeriesDescription:
      ProtocolName:
      SeriesNumber: 1
    bids:
      suffix: <ProtocolName>

Removing ImageType from attributes in exclude solves the issue. Moreover, removing any other attribute, doesn't solve the problem.

This bidsmap also reproduces the problem.

  func:
  - provenance:
    attributes:
      ProtocolName: (?i)^func.*
    bids:
      suffix: bold
      run: <<>>
      task: rest
    meta:
      TaskName: rest

  exclude:
  - provenance:
    attributes:
      ImageType:
      SeriesNumber: 1
    bids:
      suffix: <ProtocolName>

I can only narrow it down to ImageType attribute. It would help me to understand better the purpose of the empty attributes (without a value). I thought I can use them with an anchor to reuse in run items, and overwrite when needed. And that I need them to differentiate different types of sequences. But it seems to me that they do more than that.

marcelzwiers commented 8 months ago

I will do a test my self, but first to address your question

I can only narrow it down to ImageType attribute. It would help me to understand better the purpose of the empty attributes (without a value). I thought I can use them with an anchor to reuse in run items, and overwrite when needed. And that I need them to differentiate different types of sequences. But it seems to me that they do more than that.

The way run-item matching works is as follows. All properties and attributes in the template bidsmap are compared to the attributes and properties of the data source. The properties and attributes can contain regular expressions, or can even be empty, acting as a broad filter or even as no filter at all. However, and that is the somewhat hidden power of bidscoin, when a match is found, it is stored in the study bidsmap with all template attribute values replaced by the data source attribute values. In this way, the run-item becomes a very narrow filter for the discovered data source. You should also know that run-matching is always performed first in the study template, and then in the template bidsmap. In this way, the study bidsmap becomes a precise reflection of the dataset (i.e. a collection of discovered datatypes). And the more attributes you add, the more precise it becomes. I hope this helps to understand why adding empty attributes in the template bidsmap will not change the template run-item matching itself (because empty attributes are ignored when doing template matching), but it will make the study bidsmap more specific/selective. (I do all this because users can then freely edit the study bidsmap to their liking, but that is something you don't use, so this procedure may perhaps seem overcomplicated to you. Another way of looking at this is this that the template bidsmap contains the prior knowledge, and that the study bidsmap contains the user/posterior knowledge)

As a simple example, suppose you have an extensive acquisition protocol with a high-resolution + a low-resolution T1 scan with the same ProtcoolName. If you only add {ProtocolName: .*T1w.*} to the template, then your study bidsmap will contain one T1 run-item, and bidscoiner will produce 2 T1 images (run-1 and run-2). If you add an empty SliceThickness attribute, i.e. {ProtocolName: .*T1w.*, SliceThickness: } then you will get two run-items in the study bidsmap (with different SliceThickness values) and bidscoiner will produce two different run-1 T1 images (if you want so). So rather than having to specify every kind of run-item in the template bidsmap, you can broadly specify them, and still discriminate between subtle protocol variations in your acquisition -- which is what you want (although to be useful you need to add the posterior knowledge, i.e. use the bidseditor).

marcelzwiers commented 8 months ago

Oh yeah, and to add to that, in the template bidsmap I often add an entirely empty run-item as a first item, which I then dereference in the following run-items. That is for convenience and innocent (because it never matches anything), but it is also useful for the bidseditor. Because in the bidseditor, when a user changes e.g. the datatype, then the first run-item of that new datatype is taken. In this way, I can control the default suffix that is then taken

marcelzwiers commented 8 months ago

Ok, this was a rare case that I fixed by making my bidsmap duplicate entry check more specific. Btw, there is an error in your test case above, namely that instead of using ProtocolName:

- provenance:
    attributes:
      <<: *func_dicomattr
      ProtocolName: (?i)^func.*

You should have used SeriesDescription (as this was used by your DICOM generator)

marcelzwiers commented 8 months ago

Are you making the transition to v4.3.1 now? If so, and if your done with testing and it all works, then please let me know and I'll release 4.3.1 on pypi

mateuszpawlik commented 8 months ago

Hi @marcelzwiers, I was busy with training this week. I'm trying to get back to BIDScoin today. There are three issues on my testing list. I will start with this one and let you know.

mateuszpawlik commented 8 months ago

Btw, there is an error in your test case above, namely that instead of using ProtocolName: ... You should have used SeriesDescription (as this was used by your DICOM generator)

Thanks. That would explain many things :see_no_evil: But I can swear I checked it and both of the fields were there. This could be the reason of the confusion too.

The generated study map was also wrong then. It has empty strings for ProtocolName, and this was the only attribute to match. Then of course the sequences didn't match anything and were categorized as unknown.

Here's the study map for the last template map in https://github.com/Donders-Institute/bidscoin/issues/229#issuecomment-2016535956

``` Options: bidscoin: version: 4.3.1 bidsignore: - extra_data/ subprefix: sub- sesprefix: ses- unknowntypes: # A list of datatypes that are converted to BIDS-like datatype folders - extra_data ignoretypes: # A list of datatypes that are excluded / not converted to BIDS - exclude unzip: null notderivative: [] plugins: dcm2niix2bids: command: dcm2niix args: -b y -z n -i n anon: n # Not to shift acquisition dates. meta: [.json, .tsv, .tsv.gz] DICOM: subject: <> #<> session: <> #<> func: - provenance: /home/mpawlik/Remote/anc/anc-data-acquisition/work/a8/474242d076521da2a50d3f84645baf/dataset/sourcedata/sub-001/ses-1/002-func_task-rest/MR000002 properties: filepath: '' filename: '' filesize: '' nrfiles: null attributes: ProtocolName: '' bids: suffix: bold run: <<>> task: rest acq: '' ce: '' rec: '' dir: '' echo: '' part: - '' - mag - phase - real - imag - 0 chunk: '' meta: TaskName: rest - provenance: /home/mpawlik/Remote/anc/anc-data-acquisition/work/a8/474242d076521da2a50d3f84645baf/dataset/sourcedata/sub-001/ses-1/003-func_task-rest/MR000003 properties: filepath: '' filename: '' filesize: '' nrfiles: null attributes: ProtocolName: '' bids: suffix: bold run: <<>> task: rest acq: '' ce: '' rec: '' dir: '' echo: '' part: - '' - mag - phase - real - imag - 0 chunk: '' meta: TaskName: rest exclude: - provenance: /home/mpawlik/Remote/anc/anc-data-acquisition/work/a8/474242d076521da2a50d3f84645baf/dataset/sourcedata/sub-001/ses-1/001-func_task-rest/MR000001 properties: filepath: '' filename: '' filesize: '' nrfiles: null attributes: ImageType: "['ORIGINAL', 'PRIMARY', 'M', 'ND', 'MOSAIC']" SeriesNumber: '1' bids: suffix: '' meta: {} ```

P.S. I know where I've seen both of the fields - in the sidecar json files. dcm2niix must add ProtocolName there.

mateuszpawlik commented 8 months ago

The version I started with (pip install --force-reinstall git+https://github.com/Donders-Institute/bidscoin@2aceaa4ebd4a2894c875f0e3944ff1b3b7f71a20):

my minimal example (after fixing ProtocolName to SeriesDescription) - things work as expected
real data - the original problem (unknown series 8) is there - this series doesn't have a match in the generated study map but should find a match in the template map.

BIDScoin version pip install --force-reinstall git+https://github.com/Donders-Institute/bidscoin (should fetch the newest commit on the master branch):

my minimal example (after fixing ProtocolName to SeriesDescription) - things work as expected
real data - things work as expected

Thanks a lot! I like your explanation.

Is there anything I can provide for this issue? I don't know how to trigger the debugging comments you added in your commit.

marcelzwiers commented 8 months ago

No, if your happy then you can close this issue :-)

p.s. If you run bidscoin -h (or just bidscoin), you will see how you can activate the debugging mode

mateuszpawlik commented 8 months ago

The debug logs look good. Thanks. Closing as resolved.

marcelzwiers commented 8 months ago

Mhhh, my commit (df7032f6e3a43fafa727a5a9d1e2142c6f50ad34) solved your issue but creates false positives for bidsmaps in which users have (legitimately) changed the data type. I will have to revert/fix the fix :-(

marcelzwiers commented 8 months ago

Bummer, the more I think about it, the more I get convinced you have run to a fundamental limit of BIDScoin's design. Basically what you are doing when excluding a particular SeriesNumber is adding SeriesNumber to your bidsmap exclude section, but you don't add (rightfully so) SeriesNumber to the func section of your bidsmap. Before my fix, when doing data source discovery, if a non-excluded is already added to your bidsmap, then your excluded run was also matching it (because the func item doesn't contain SeriesNumber) and BIDScoin hence did not add your exclude item as a new item to the bidsmap (i.e. the reason you posted this issue). My fix works fine if you don't change datatype after running the bidsmapper (i.e. with the bidseditor), but it leads to double entries otherwise. This is really not desirable, and I don't see a way around it, so I'm reverting my commit for now. If I find a new solution I'll let you know, but until then you have to find another way of excluding your Series

marcelzwiers commented 8 months ago

One way of solving your problem (in 4.3.2) is to include a regular expression for your func item, in which you require that SeriesNumber is unequal to 7, i.e. the opposite of your exclude item

marcelzwiers commented 8 months ago

If that works, then this issue can be closed

mateuszpawlik commented 8 months ago

I will read carefully your comments next week. I appreciate a lot all your help :blush: But, since my use case is different than the default, I am happy with workarounds too :slightly_smiling_face:

mateuszpawlik commented 7 months ago

I'm testing 14c74ade8fbdd0e862b7e01614e41afe0c6a719a from master branch. The regex doesn't seem to work. I've tried SeriesNumber: ^[0-68-9], which should much everything not starting with 7. Despite no match found, both sequences are converted. And both sequences are in the study template.

Logs of bidsmapper for my original two problematic sequences (I deleted path prefixes for readability):

BCDEBUG - bidscoin.bids | Found no bidsmap match for: dataset/sourcedata/sub-s056/ses-1/007-FMRI_dopa6/MR000395
BCDEBUG - bidscoin.plugin.dcm2niix2bids | No match found in the study bidsmap, now trying the template bidsmap
BCDEBUG - bidscoin.bids | Found no bidsmap match for: dataset/sourcedata/sub-s056/ses-1/007-FMRI_dopa6/MR000395
INFO - bidscoin.plugin.dcm2niix2bids | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s056/ses-1/007-FMRI_dopa6/MR000395
BCDEBUG - bidscoin.bids | Found no bidsmap match for: dataset/sourcedata/sub-s056/ses-1/008-FMRI_dopa6/MR000401
BCDEBUG - bidscoin.plugin.dcm2niix2bids | No match found in the study bidsmap, now trying the template bidsmap
BCDEBUG - bidscoin.bids | Found bidsmap match: sub--unknown/ses--unknown/DICOM_func_id010 -> dataset/sourcedata/sub-s056/ses-1/008-FMRI_dopa6/MR000401
INFO - bidscoin.plugin.dcm2niix2bids | Discovered 'func' DICOM sample: dataset/sourcedata/sub-s056/ses-1/008-FMRI_dopa6/MR000401

With test data it also doesn't work - all sequences are converted. There are three functional sequences with SeriesNumber from 1 to 3. The rule I used for test data (which should not match the second sequence):

  func:
  - provenance:
    attributes:
      SeriesDescription: ^func.*
      SeriesNumber: ^[13]
    bids:
      suffix: bold
      run: <<>>
      task: rest
    meta:
      TaskName: rest

I understood that I should not have the sequence to exclude in the exclude section anymore. Did I misunderstand?

My test DICOM

Let me know if I can do more.

marcelzwiers commented 7 months ago

You have to add a matching Series 7 in your exclude section, and a non-matching Series 7 (^[0-68-9]) in your func section. Otherwise your Series 7 data source will have two or zero matches in your bidsmap, and then things go wrong

marcelzwiers commented 7 months ago

Think of it like this. If bidscoin comes across a data source it looks in the bidsmap for a mapping to a BIDS filename (and metadata). So if the data source doesn't match anything you get an error/warning, if you have two or more matches, bidscoin faces an ambiguity it cannot resolve (so it takes the first one it finds). To make building bidsmaps a little easier, I've made the search order such that exclude items are searched first, and extra_data last (you can sometimes make use of this if you understand how it works)

mateuszpawlik commented 7 months ago

You have to add a matching Series 7 in your exclude section, and a non-matching Series 7 (^[0-68-9]) in your func section. Otherwise your Series 7 data source will have two or zero matches in your bidsmap, and then things go wrong

So I did misunderstand. I added the matching pattern to the exclude section and it works.

Example for test data:

func:
  - provenance:
    attributes:
      SeriesDescription: ^func.*
      SeriesNumber: ^[13]
    bids:
      suffix: bold
      run: <<>>
      task: rest
    meta:
      TaskName: rest

  exclude:
  - provenance:
    attributes:
      SeriesNumber: 2

mateuszpawlik commented 7 months ago

Think of it like this. If bidscoin comes across a data source it looks in the bidsmap for a mapping to a BIDS filename (and metadata). So if the data source doesn't match anything you get an error/warning, if you have two or more matches, bidscoin faces an ambiguity it cannot resolve (so it takes the first one it finds). To make building bidsmaps a little easier, I've made the search order such that exclude items are searched first, and extra_data last (you can sometimes make use of this if you understand how it works)

This I understand.

However, I was confused by the logs. Although there was no match for 007-FMRI_dopa6, and I expected a warning, it was discovered as a func sequence.

Donders-Institute / bidscoin

Exclude sections behaves weirdly #229