bids-standard / BEP028_BIDSprov

Organizing and coordinating BIDS extension proposal 28 : BIDS Provenance
https://bids.neuroimaging.io/bep028
Creative Commons Attribution 4.0 International
4 stars 13 forks source link

visualizer.py fails on chained MEG processing pipeline [update: solved] #120

Closed robertoostenveld closed 1 year ago

robertoostenveld commented 1 year ago

Following https://github.com/bids-standard/bep021/issues/8#issuecomment-1604372097 I am trying to construct a .prov.jsonld file for an example chained analysis pipeline of MEG data.

I cannot get the visualizer.py to work on my example, neither on the one that we hand-crafted in Kopenhagen. Both return an empty png figure. @cmaumet can you spot what is wrong?

{
    "BIDSProvVersion":"dev",
    "@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
    "records":{
        "prov:Agent":{
            "@id":"ed00edd9-eb1a-4f0c-9e84-16aa7e2d63e9",
            "label":"FieldTrip",
            "version":"0962f5dce"
        },
        "prov:Activity":[
            {
                "@id":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a",
                "label":"preproc",
                "used":"bids_raw:///sub-01/meg/sub-01_task-language_meg.ds"
            },
            {
                "@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1",
                "label":"avg",
                "used":"bids_derivative:///sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"8eadc657-b256-4e76-ba36-6d3069c4d92e",
                "label":"avg",
                "used":"bids_derivative:///sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"abd1c016-188a-4837-8101-4aa28638dfc3",
                "label":"avg",
                "used":"bids_derivative:///sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"4af2120c-e484-41b0-8d9d-da63990bf17c",
                "label":"combined",
                "used":"bids_derivative:///sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds"
            },
            {
                "@id":"103d21fd-7623-4650-8926-72f527d08c05",
                "label":"combined",
                "used":"bids_derivative:///sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds"
            },
            {
                "@id":"d3730058-518f-4793-bf48-bfba5c7f9d22",
                "label":"combined",
                "used":"bids_derivative:///sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds"
            }
        ],
        "prov:Entity":[
            {
                "@id":"bids_derivative/sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
                "label":"data",
                "generatedBy":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a"
            },
            {
                "@id":"bids_derivative/sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds",
                "label":"timelockFC",
                "generatedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
            },
            {
                "@id":"bids_derivative/sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds",
                "label":"timelockIC",
                "generatedBy":"8eadc657-b256-4e76-ba36-6d3069c4d92e"
            },
            {
                "@id":"bids_derivative/sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds",
                "label":"timelockFIC",
                "generatedBy":"abd1c016-188a-4837-8101-4aa28638dfc3"
            },
            {
                "@id":"bids_derivative/sub-01/meg/sub-01_task-languageFC_desc-combined_meg.ds",
                "label":"combinedFC",
                "generatedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
            },
            {
                "@id":"bids_derivative/sub-01/meg/sub-01_task-languageIC_desc-combined_meg.ds",
                "label":"combinedIC",
                "generatedBy":"103d21fd-7623-4650-8926-72f527d08c05"
            },
            {
                "@id":"bids_derivative/sub-01/meg/sub-01_task-languageFIC_desc-combined_meg.ds",
                "label":"combinedFIC",
                "generatedBy":"d3730058-518f-4793-bf48-bfba5c7f9d22"
            }
        ]
    }
}
robertoostenveld commented 1 year ago

Note that the processing looks something like this

digraph {
  preproc [ color=red, shape=square ]
  avg [ color=red, shape=square ]
  combined [ color=red, shape=square ]
  rawdata -> preproc -> data
  data -> avg -> timelockFC -> combined -> combinedFC
  data -> avg -> timelockIC -> combined -> combinedIC
  data -> avg -> timelockFIC -> combined -> combinedFIC
}

combined

where "preproc", "avg" and "combined" are processing steps, and all others are data.

The "avg" (computing the ERFs) is repeated 3 times for different selection of trials (corresponding to conditions), and consequently the "combined" step is also executed three times (for the three conditions).

cmaumet commented 1 year ago

@robertoostenveld: Thanks for sharing those!

I think the problem is probably related to recent update to the library to better reflect key names defined in the spec document (mainly changes in case):

In case I missed some, all the keys are defined in this context file (now harmonized with what is in the BIDS-Prov spec).

A revised version of the minimal example we had worked on in Copenhagen is here: https://github.com/bids-standard/BEP028_BIDSprov/blob/master/examples/simple_example/simple_example.prov.jsonld

Does this help?

robertoostenveld commented 1 year ago

Also the example does not result in a figure with python -m visualize --input_file ./simple_example.prov.jsonld

cmaumet commented 1 year ago

[Edit: redirect is now in place]

Thanks for spotting that and sorry! There was an issue in the PURL redirection https://purl.org/nidash/bidsprov/context.json that I've just fixed. Now the PURL context link should redirect to https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json and solve the issue.

On my machine python -m visualize --input_file ../examples/simple_example/simple_example.prov.jsonld --output_file here.png returns:

here

cmaumet commented 1 year ago

Updating your example to (update key names and paths in bids URIs):

{
    "BIDSProvVersion":"dev",
    "@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
    "Records":{
        "Software":{
            "@id":"ed00edd9-eb1a-4f0c-9e84-16aa7e2d63e9",
            "Label":"FieldTrip",
            "Version":"0962f5dce"
        },
        "Activities":[
            {
                "@id":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a",
                "Label":"preproc",
                "Used":"bids_raw::sub-01/meg/sub-01_task-language_meg.ds"
            },
            {
                "@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1",
                "Label":"avg",
                "Used":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"8eadc657-b256-4e76-ba36-6d3069c4d92e",
                "Label":"avg",
                "Used":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"abd1c016-188a-4837-8101-4aa28638dfc3",
                "Label":"avg",
                "Used":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"4af2120c-e484-41b0-8d9d-da63990bf17c",
                "Label":"combined",
                "Used":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds"
            },
            {
                "@id":"103d21fd-7623-4650-8926-72f527d08c05",
                "Label":"combined",
                "Used":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds"
            },
            {
                "@id":"d3730058-518f-4793-bf48-bfba5c7f9d22",
                "Label":"combined",
                "Used":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds"
            }
        ],
        "Entities":[
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
                "Label":"data",
                "GeneratedBy":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds",
                "Label":"timelockFC",
                "GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds",
                "Label":"timelockIC",
                "GeneratedBy":"8eadc657-b256-4e76-ba36-6d3069c4d92e"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds",
                "Label":"timelockFIC",
                "GeneratedBy":"abd1c016-188a-4837-8101-4aa28638dfc3"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-combined_meg.ds",
                "Label":"combinedFC",
                "GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-combined_meg.ds",
                "Label":"combinedIC",
                "GeneratedBy":"103d21fd-7623-4650-8926-72f527d08c05"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-combined_meg.ds",
                "Label":"combinedFIC",
                "GeneratedBy":"d3730058-518f-4793-bf48-bfba5c7f9d22"
            }
        ]
    }
}

The vizualize function gives us: here

Compared to your example there are 3 "combine" activities.

cmaumet commented 1 year ago

Removing duplicate activities (this is closer to what we have above in your diagram but maybe not what we want as provenance):

{
    "BIDSProvVersion":"dev",
    "@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
    "Records":{
        "Software":{
            "@id":"ed00edd9-eb1a-4f0c-9e84-16aa7e2d63e9",
            "Label":"FieldTrip",
            "Version":"0962f5dce"
        },
        "Activities":[
            {
                "@id":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a",
                "Label":"preproc",
                "Used":"bids_raw::sub-01/meg/sub-01_task-language_meg.ds"
            },
            {
                "@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1",
                "Label":"avg",
                "Used":["bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
                "bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
                "bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"]
            },
            {
                "@id":"4af2120c-e484-41b0-8d9d-da63990bf17c",
                "Label":"combined",
                "Used": [
                "bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds", "bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds", "bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds"]
            }
        ],
        "Entities":[
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
                "Label":"data",
                "GeneratedBy":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds",
                "Label":"timelockFC",
                "GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds",
                "Label":"timelockIC",
                "GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds",
                "Label":"timelockFIC",
                "GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-combined_meg.ds",
                "Label":"combinedFC",
                "GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-combined_meg.ds",
                "Label":"combinedIC",
                "GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
            },
            {
                "@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-combined_meg.ds",
                "Label":"combinedFIC",
                "GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
            }
        ]
    }
}

here

arnodelorme commented 1 year ago

Following up on our discussion. Is the id, for example, "@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1", supposed to be common across BIDS datasets? Would it be possible to have simpler IDs? How are the IDs generated? For example, could we have "preproc-fieldtrip-0962f5dce." It would make the text file much easier to read by the common mortal.

Also, are the parameters to the functions called timelockFIC defined somewhere (even if it is not in a standard format).

cmaumet commented 1 year ago

Hi @arnodelorme!

Yes you can have simpler ids (but they still have to be URIs), I'm pasting here the BIDS-Prov spec:


@Id: REQUIRED. A unique identifier like a UUID that will be used to associate activities with this software (e.g., urn:1264-1233-11231-12312, "urn:bet-o1ef4rt"


So what you are proposing as an id could work, possibly prepending a prefix urn:

For the parameters I would recommend adding a Command, similar to here.

Pingging: @satra in case he'd like to chime in!

satra commented 1 year ago

@arnodelorme - in addition to what camille said, the Label is what common mortals should use for reading, the id is what computers would use to determine uniqueness of that node in a graph. also Label does not have to be a single word. it is a short name.

robertoostenveld commented 1 year ago

Hi @arnodelorme, the example I constructed and shared above was using random identifiers generated in MATLAB with char(java.util.UUID.randomUUID.toString).

What I understand from the FSL example, for example when searching for fslmaths, is that repeated calls to the same function (or application) should have unique IDs. So my activity label should have been ft_timelockanalysis rather than avg, but the ID would remain as random as it is now.

After iterating a few more times, and also realizing that I want each activity (= function call) to be represented separately, I now have this:

{
    "BIDSProvVersion":"dev",
    "Records":{
        "Software":{
            "@id":"158ce020-01fe-4d8f-9e0d-de8c2cd711d0",
            "Label":"FieldTrip",
            "Version":"0aa4735f6"
        },
        "Activities":[
            {
                "@id":"d57adc29-f990-40db-a6d8-7cd29033f175",
                "Label":"ft_preprocessing",
                "Used":"bids_raw/sub-01/meg/sub-01_task-language_meg.ds"
            },
            {
                "@id":"d5a6423d-4dd0-467f-9b5a-85d1c54feede",
                "Label":"ft_timelockanalysis",
                "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"073b31e3-f7b8-4121-aae8-b0639ae32619",
                "Label":"ft_timelockanalysis",
                "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"8d5bac53-6891-4127-a627-6ab8646c4ce1",
                "Label":"ft_timelockanalysis",
                "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
            },
            {
                "@id":"e1410a6f-4868-4085-9a8b-b44e17d552af",
                "Label":"ft_megplanar+ft_combineplanar",
                "Used":"sub-01/meg/sub-01_task-language_desc-timelockFC_meg.ds"
            },
            {
                "@id":"8adcb54d-f70a-4ff1-91b7-67b0c0cd70d6",
                "Label":"ft_megplanar+ft_combineplanar",
                "Used":"sub-01/meg/sub-01_task-language_desc-timelockIC_meg.ds"
            },
            {
                "@id":"38281c42-8051-4f26-bb31-03f05ff6d716",
                "Label":"ft_megplanar+ft_combineplanar",
                "Used":"sub-01/meg/sub-01_task-language_desc-timelockFIC_meg.ds"
            }
        ],
        "Entities":[
            {
                "@id":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
                "Label":"data",
                "GeneratedBy":"d57adc29-f990-40db-a6d8-7cd29033f175"
            },
            {
                "@id":"sub-01/meg/sub-01_task-language_desc-timelockFC_meg.ds",
                "Label":"timelockFC",
                "GeneratedBy":"d5a6423d-4dd0-467f-9b5a-85d1c54feede"
            },
            {
                "@id":"sub-01/meg/sub-01_task-language_desc-timelockIC_meg.ds",
                "Label":"timelockIC",
                "GeneratedBy":"073b31e3-f7b8-4121-aae8-b0639ae32619"
            },
            {
                "@id":"sub-01/meg/sub-01_task-language_desc-timelockFIC_meg.ds",
                "Label":"timelockFIC",
                "GeneratedBy":"8d5bac53-6891-4127-a627-6ab8646c4ce1"
            },
            {
                "@id":"sub-01/meg/sub-01_task-language_desc-combinedFC_meg.ds",
                "Label":"combinedFC",
                "GeneratedBy":"e1410a6f-4868-4085-9a8b-b44e17d552af"
            },
            {
                "@id":"sub-01/meg/sub-01_task-language_desc-combinedIC_meg.ds",
                "Label":"combinedIC",
                "GeneratedBy":"8adcb54d-f70a-4ff1-91b7-67b0c0cd70d6"
            },
            {
                "@id":"sub-01/meg/sub-01_task-language_desc-combinedFIC_meg.ds",
                "Label":"combinedFIC",
                "GeneratedBy":"38281c42-8051-4f26-bb31-03f05ff6d716"
            }
        ]
    },
    "@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json"
}

which renders like this

res

The most relevant changes are that I give the activities the FieldTrip function as their label, and the entities get the MATLAB variable name as the label. So the MATLAB data structures that I have in memory after running this for a single subject are data, timelockFC, timelockIC, timelockFIC, combinedFC, combinedIC, and combinedFIC, and each of these is also represented as CTF dataset (.ds) on disk.

What I don't yet have figured out is how to add "Command" details, which in FieldTrip would be the input cfg structure.

What I also have not yet figured out is how to deal with steps that are not represented on disk. In the case above that is the output of ft_megplanar, which is followed by ft_combineplanar. In the case of ICA cleaning it could also be ft_componentanalysis, followed by ft_rejectcomponent. The back-projected clean raw data could be written to disk according to BEP021, but the unmixed data (component topos and timeseries) not. I think this also shows where (i)EEG and MEG diverge from (f)MRI analyses, which mostly takes nifti files as input and spits out other nifti files. The data structures of (i)EEG/MEG analyses are not so consistent.

I'll keep working on it for a bit longer, to see whether I can make further progress in line with this BEP028 and BEP021.

arnodelorme commented 1 year ago

Thank you Robert. Yes, this is a step in the right direction. For the IDs of the activities, could you use something like

            {
                "@id":"code/fieldtrip_reference_functions/ft_timelockanalysis.m",
                "Label":"ft_timelockanalysis",
                "Used":"sub-01/meg/sub-01_task-language_desc-timelockFIC_meg.ds"
            }
robertoostenveld commented 1 year ago

Hi Arno, I understand that IDs have to be unique (i.e., separate nodes in the graph), so two calls to ft_timelockanalysis should have different IDs. It would be possible to do something like this

{
  "@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion":"dev",
  "Records":{
    "Software":[
      {
        "Id":"683ef168-914c-4c2e-a957-885c15cf2636",
        "RRID":"RRID:SCR_004849",
        "Label":"FieldTrip",
        "Type":"Software",
        "Version":"0aa4735f6"
      }
   ],
    "Activities":[
      {
        "Id":"ft_timelockanalysis-7e50855c",
        "Label":"ft_timelockanalysis",
        "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
        "AssociatedWith":"683ef168-914c-4c2e-a957-885c15cf2636"
      },
      {
        "Id":"ft_timelockanalysis-d1da5e67",
        "Label":"ft_timelockanalysis",
        "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
        "AssociatedWith":"683ef168-914c-4c2e-a957-885c15cf2636"
      },
      ...

but I don't see an advantage of this over having fully random UUIDs. I think it is better to fully ignore the IDs when you look at it, unless you use them to link things together (as with the lines in the graph). Oh, and note that the ID also does not have to have a reference to the software package, as that is something you can do with the AssociatedWith reference.

PS I noticed that @id is not needed, it can also be Id. That makes writing the json easier. Have a look at my code_v3 directory, which has a writeprov.m function, and two functions that help in constructing the provenance as MATLAB structure.

arnodelorme commented 1 year ago

Thanks Robert

On Sat, Sep 23, 2023 at 5:32 AM Robert Oostenveld @.***> wrote:

Hi Arno, I understand that IDs have to be unique (i.e., separate nodes in the graph), so two calls to ft_timelockanalysis should have different IDs. It would be possible to do something like this

{ @.***":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json", "BIDSProvVersion":"dev", "Records":{ "Software":[ { "Id":"683ef168-914c-4c2e-a957-885c15cf2636", "RRID":"RRID:SCR_004849", "Label":"FieldTrip", "Type":"Software", "Version":"0aa4735f6" } ], "Activities":[ { "Id":"ft_timelockanalysis-7e50855c", "Label":"ft_timelockanalysis", "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds", "AssociatedWith":"683ef168-914c-4c2e-a957-885c15cf2636" }, { "Id":"ft_timelockanalysis-d1da5e67", "Label":"ft_timelockanalysis", "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds", "AssociatedWith":"683ef168-914c-4c2e-a957-885c15cf2636" }, ...

but I don't see an advantage of this over having fully random UUIDs. I think it is better to fully ignore the IDs when you look at it, unless you use them to link things together (as with the lines in the graph). Oh, and note that the ID also does not have to have a reference to the software package, as that is something you can do with the AssociatedWith reference.

PS I noticed that @id is not needed, it can also be Id. That makes writing the json easier. Have a look at my code_v3 directory, which has a writeprov.m function, and two functions that help in constructing the provenance as MATLAB structure.

— Reply to this email directly, view it on GitHub https://github.com/bids-standard/BEP028_BIDSprov/issues/120#issuecomment-1732307978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOJGQNUGZVRLSSUT4XWZ33X33JFHANCNFSM6AAAAAA4X6IP5Q . You are receiving this because you were mentioned.Message ID: @.***>

robertoostenveld commented 1 year ago

Besides the BIDS specific example script whose results are being discussed at https://github.com/bids-standard/bep021/issues/8, I have now also added support for the prov.jsonld output to ft_analysispipeline with https://github.com/fieldtrip/fieldtrip/commit/93d379282151547ec4533254a921b4ab3ec517a1.

I think there is nothing any more to be done on this issue.