Donders-Institute / bidscoin

BIDScoin converts your source-level neuroimaging data to BIDS
https://bidscoin.readthedocs.io
GNU General Public License v3.0
130 stars 36 forks source link

Add easy to check output with information on what source was converted to what BIDS files #224

Closed dzemanov closed 7 months ago

dzemanov commented 7 months ago

Relates to #213

It would be really useful to output as json information about what source files where converted to what output files by bidscoiner and what bidsmap entry definition was used for it. It would be amazing if also skipped files could be part of it. This heps user to quickly check that conversion went well or lookup for concrete files what were the outputs.

Maybe like this:

{
    "tags": ["skipped"],
    "bidsmap-provenance": "myproject/raw/sub-01/std1_01_AAHead_Scout/00001.dcm",
     "inputfiles": ["/home/smith/myproject/raw/sub-01/std1_03_t1_mprage_sag/00001.dcm", ...]
     "outputfiles": []
},
{
    "tags": ["skipped"],
    "bidsmap-provenance": "myproject/raw/sub-01/std1_02_localizer/00001.dcm",
     "inputfiles": ["/home/smith/myproject/raw/sub-01/std1_02_localizer/00001.dcm", ...]
     "outputfiles": []
},
{
    "bidsmap-provenance": "myproject/raw/sub-01/std1_03_t1_mprage_sag_p2_1isow/00001.dcm",
     "command": "dcm2niix -b y -z y -i n -l n -f 'sub-01_T1w' -o 'myproject/bids/sub-01/anat' 'myproject/raw/sub-01/std1_03_t1_mprage_sag_p2_1isow/'"
     "inputfiles": ["/home/smith/myproject/raw/sub-01/std1_03_t1_mprage_sag/00001.dcm", ...]
     "outputfiles": ["/home/smith/myproject/bids/sub-01/anat/sub-01_T1w.nii.gz"]
},
{
    "bidsmap-provenance": "myproject/raw/sub-01/std1_04_bold_25iso_tr980_rest_SBRef/00001.dcm"",
     "command": "dcm2niix -b y -z y -i n -l n -f 'sub-01_task-bold25isotr980rest_sbref' -o 'myproject/bids/sub-01/func' 'myproject/raw/sub-01/std1_04_bold_25iso_tr980_rest_SBRef'"
     "inputfiles": ["/home/smith/myproject/raw/sub-01/std1_04_bold_25iso_tr980_rest_SBRef/00001.dcm", "/home/smith/myproject/raw/sub-01/std1_04_bold_25iso_tr980_rest_SBRef/00001_ph.dcm", ...]
     "outputfiles": ["/home/smith/myproject/bids/sub-01/func/sub-01_task-bold25isotr980rest_echo-1_sbref.nii.gz", "/home/smith/myproject/bids/sub-01/func/sub-01_task-bold25isotr980rest_echo-1_part-phase_sbref.nii.gz", echo2...]
}
etc...

Or via BEP028, maybe something similiar to this:

{
    "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
    "BIDSProvVersion": "dev",
    "Records": {
        "Software": [
            {
                "@id": "urn:363748e0-c76d-4cc6-b8a6-4269bf81fab9",
                "RRID": "RRID:SCR_022839",
                "@type": "prov:SoftwareAgent",
                "Label": "bidscoin",
                "Version": "4.3.0"
            },
            {
                "@id": "urn:8e9053c3-3a35-4c02-8ae0-6de38336fcd7",
                "RRID": "SCR_023517",
                "@type": "prov:SoftwareAgent",
                "Label": "dcm2niix",
                "Version": "v1.0.20220505"
            }
        ],
        "Activities": [
            {
                "@id": "urn:1bafeb8d-264b-4dd9-9be4-ce2a5f1c9df5",
                "Label": "bidsmapper",
                "AssociatedWith": "urn:363748e0-c76d-4cc6-b8a6-4269bf81fab9",
                "Command": "bidsmapper myproject/raw myproject/bids -t bidsmap_custom",
                "Parameters": {},
                "Used": []
            },
            {
                "@id": "urn:c8651006-ceba-42bb-8f8f-64167d9f026d",
                "Label": "bidscoiner",
                "AssociatedWith": "urn:363748e0-c76d-4cc6-b8a6-4269bf81fab9",
                "Command": "bidscoiner myproject/raw myproject/bids",
                "Parameters": {},
                "Used": []
            },
            {
                "@id": "urn:3b18fdf9-45dc-4536-a632-1c4463a7717a",
                "Label": "dcm2niix",
                "AssociatedWith": "urn:8e9053c3-3a35-4c02-8ae0-6de38336fcd7",
                "Command": "dcm2niix -b y -z y -i n -l n -f 'sub-01_T1w' -o 'myproject/bids/sub-01/anat' 'myproject/raw/sub-01/std1_03_t1_mprage_sag_p2_1isow/'",
                "Parameters": {
                    "bidsmap_provenance": "myproject/raw/sub-01/std1_03_t1_mprage_sag_p2_1isow/00001.dcm"
                },
                "Used": ["urn:4362208f-ea8b-4b4f-8f7c-f2d7b21d342b"]
            },
            {
                "@id": "urn:4d3a9181-2aad-4589-93cc-ccc93f0a0e4e",
                "Label": "dcm2niix",
                "AssociatedWith": "urn:8e9053c3-3a35-4c02-8ae0-6de38336fcd7",
                "Command": "dcm2niix -b y -z y -i n -l n -f 'sub-01_task-bold25isotr980rest_sbref' -o 'myproject/bids/sub-01/func' 'myproject/raw/sub-01/std1_04_bold_25iso_tr980_rest_SBRef'",
                "Parameters": {
                    "bidsmap_provenance": "myproject/raw/sub-01/std1_04_bold_25iso_tr980_rest_SBRef/00001.dcm"
                },
                "Used": ["urn:9b584ee0-b95e-4942-aece-23c6d7eac565"]
            }
        ],
        "Entities": [
            {
                "@id": "urn:86648548-ef86-4c65-acb2-f66f919cd602",
                "Label": "bidsmap.yaml",
                "AtLocation": "/home/smith/myproject/bids/code/bidscoin/bidsmap.yaml",
                "GeneratedBy": "urn:1bafeb8d-264b-4dd9-9be4-ce2a5f1c9df5"
            },
            {
                "@id": "urn:1023f701-3809-4dfa-a5db-95a8bf03b9b4",
                "Label": "dataset_description.json",
                "AtLocation": "/home/smith/myproject/bids/dataset_description.json",
                "GeneratedBy": "urn:c8651006-ceba-42bb-8f8f-64167d9f026d"
            },
            {
                "@id": "urn:12f091f0-d4d2-4e45-80db-433d4bcac9ec",
                "Label": "README.md",
                "AtLocation": "/home/smith/myproject/bids/README.md",
                "GeneratedBy": "urn:c8651006-ceba-42bb-8f8f-64167d9f026d"
            },
            {
                "@id": "urn:c49fe0fe-52a2-4c2d-98ca-3786b8af1015",
                "Label": "skipped",
                "AtLocation": "/home/smith/myproject/raw/sub-01/std1_01_AAHead_Scout/00001.dcm"
            },
            {
                "@id": "urn:c18b49a9-f7c6-4e36-9954-3a562af73d72",
                "Label": "skipped",
                "AtLocation": "/home/smith/myproject/raw/sub-01/std1_02_localizer/00001.dcm"
            },
            {
                "@id": "urn:4362208f-ea8b-4b4f-8f7c-f2d7b21d342b",
                "Label": "std1_03_t1_mprage_sag",
                "AtLocation": "/home/smith/myproject/raw/sub-01/std1_03_t1_mprage_sag/00001.dcm"
            },
            {
                "@id": "bids::sub-01/anat/sub-01_T1w.nii.gz",
                "Label": "sub-01_T1w.nii.gz",
                "AtLocation": "/home/smith/myproject/bids/sub-01/anat/sub-01_T1w.nii.gz",
                "GeneratedBy": "urn:3b18fdf9-45dc-4536-a632-1c4463a7717a"
            },
            {
                "@id": "urn:9b584ee0-b95e-4942-aece-23c6d7eac565",
                "Label": "std1_04_bold_25iso_tr980_rest_SBRef",
                "AtLocation": "/home/smith/myproject/raw/sub-01/std1_04_bold_25iso_tr980_rest_SBRef/00001.dcm"
            },
            {
                "@id": "bids::sub-01/func/sub-01_task-bold25isotr800rest_echo-1_bold.nii.gz",
                "Label": "sub-01_task-bold25isotr800rest_echo-1_bold.nii.gz",
                "AtLocation": "/home/smith/myproject/bids/sub-01/func/sub-01_task-bold25isotr980rest_echo-1_sbref.nii.gz",
                "GeneratedBy": "urn:4d3a9181-2aad-4589-93cc-ccc93f0a0e4e"
            },
            {
                "@id": "bids::sub-01/func/sub-01_task-bold25isotr980rest_echo-1_part-phase_sbref.nii.gz",
                "Label": "sub-01_task-bold25isotr980rest_echo-1_part-phase_sbref.nii.gz",
                "AtLocation": "/home/smith/myproject/bids/sub-01/func/sub-01_task-bold25isotr980rest_echo-1_part-phase_sbref.nii.gz",
                "GeneratedBy": "urn:4d3a9181-2aad-4589-93cc-ccc93f0a0e4e"
            }
        ]
    }
}
marcelzwiers commented 7 months ago

The more I read and think about BEP028, the less enthusiastic I get. I fear that it is an over-engineered solution for a problem (tracing back the provenance of the outcome data of a multi-chain pipeline) that is not exactly the problem we would like to solve (monitoring the correct working of a single-chain data conversion 'pipeline'). I tend to go for a simple solution that meets our needs and wait with extending that to BEP028 until that takes off (if it ever will). I think the logger is the most easy but less machine readable. The json is best suited, but less easy to query, so perhaps we should pick pandas/tsv after all. For now I'll introduce a simple bids.bidsprov(tag, runid, source, command, targets) function that will serve as a single entry point for the plugins to store the provenance data (in whatever form). I'm a little hesitant what to do with targets, we can store them as a comma-separated list in a single tsv cell, or use a new row for every output file (which is a little redundant and less readable, but is better searchable)

marcelzwiers commented 7 months ago

I added some code to write a simple tsv provenance file (for now). Would it work for you like this or are you missing something?

dzemanov commented 7 months ago

Hi, amazing! It works perfectly! Thank you so much. :)

I would maybe suggest sorting targets before writing them. Including subject/session column could be useful as well, but it can be extracted from other columns. If the target is in derivatives, this information is not visible from tsv, but I think that it is ok.

marcelzwiers commented 7 months ago

If the target is in derivatives, this information is not visible from tsv, but I think that it is ok.

??? That was not intended, I think it should be in targets and hence in the tsv file.

marcelzwiers commented 7 months ago

Ah, you mean that the target path is lost, that's true (I cut it off to make things more readable). I'll see what I can do

dzemanov commented 7 months ago

For sorting targets I meant something like this:

provdata.loc[source] = [runid, datatype, ', '.join([f"{'derivatives/' if 'derivatives' in target.parts else ''}{target.name}" for target in sorted(targets)])]

So e.g multiple echo will be written as: echo_1, echo_2, echo_3 instead of random order. But this is of course not necessary, it works perfectly without it. The sort index is useful as well :). (I am assuming it is sorting based upon source, I did not check how it works it yet.)

marcelzwiers commented 7 months ago

Ah, I see, good point

marcelzwiers commented 7 months ago

Fixed it now

dzemanov commented 7 months ago

sort_index will error on:

'<' not supported between instances of 'PosixPath' and 'str'

when procdata is loaded from existing csv.

So maybe instead of provdata.loc[source] = ... can be provdata.loc[str(source)] or provdata.loc[source.as_posix()]

dzemanov commented 7 months ago

sort_index will error on:

'<' not supported between instances of 'PosixPath' and 'str'

when procdata is loaded from existing csv.

So maybe instead of provdata.loc[source] = ... can be provdata.loc[str(source)] or provdata.loc[source.as_posix()]

You are quicker, I didnt have updated vesion.