effigies commented 1 year ago

Based on today's meeting, we have the following steps short-term steps, in order of ease:

1) Annotate Poldrack-lab owned dataset(s) with NeuroBagel annotations and release, for testing. 2) Search of NeuroBagel annotations of OpenNeuro datasets. Either a GraphQL endpoint (if we want in OpenNeuro) or a web page we can link to. Not sure which is faster/more convenient. 3) Direct uploaders to NeuroBagel to annotate their participants.tsv and get a participants.json to annotate their dataset. Probably easiest with a step-by-step doc or video.

Long-term steps:

4) Begin proposal to make annotations maximally encodable in BIDS. 5) Add annotation widget when ready.

Does this match everybody's recollection? @surchs @nellh

nellh commented 1 year ago

Yes, just a couple comments.

2. Search of NeuroBagel annotations of OpenNeuro datasets. Either a GraphQL endpoint (if we want in OpenNeuro) or a web page we can link to. Not sure which is faster/more convenient.

Linking out to the query page is simpler. If we want to add it to OpenNeuro search, we would need to extend our search schema and GraphQL snapshot type with a namespace for this and add UI to set the search parameters in that namespace.

I think we agreed we only wanted to include OpenNeuro dataset results in this listing on the OpenNeuro display side?

3. Direct uploaders to NeuroBagel to annotate their `participants.tsv` and get a `participants.json` to annotate their dataset. Probably easiest with a step-by-step doc or video.

This could be a page on OpenNeuro that NeuroBagel links to with a URL back to NeuroBagel as a parameter, letting the user preview the changes and upload those files with their OpenNeuro session.

surchs commented 1 year ago

Yes, this is what I also wrote down.

Annotate Poldrack-lab owned dataset(s) with NeuroBagel annotations

Thanks, this is very helpful. We're currently thinking that we will datalad clone the dataset, but then only read the sidecar files (which are usually part of the small filesize things that you don't have to datalad get separately). So if the participant.tsv and participant.json were part of the small-file portion of the datalad dataset (that's as clear as I understand it), that'd be extra convenient.

Search of NeuroBagel annotations of OpenNeuro datasets I think we agreed we only wanted to include OpenNeuro dataset results

That will work! We still have some minor issues to iron out but this should be something we can do soon. Do you have any preferences for what users should see as a result of their query?

Direct uploaders to NeuroBagel to annotate their participants.tsv and get a participants.json

I may be misunderstanding, but the NeuroBagel annotation tool will work like the BIDS validator: everything happens on the users machine, and the tool itself won't have to talk to any remote backend (for now). So the end of the process is that you can download the data dictionary you created back to your local filesystem. Or do you mean another kind of uploader?

Begin proposal to make annotations maximally encodable in BIDS

Very happy to. Our docs are little out of sync at the moment, the most up to date form of what we create is here: https://github.com/neurobagel/bagel-cli/blob/f6e22d85a1536b50e815a5f9199f63ca58a8b06f/bagel/dictionary_models.py. This is meant to be a temporary format, so we're happy to replace it with another temporary format that is more likely to be compatible with the BIDS spec while we work on a more formal proposal. I'm assuming that would be done as part of a BEP?

Add annotation widget when ready.

:rocket:

rmanaem commented 1 year ago

@effigies @nellh Hi folks, I'm Arman from the Neurobagel dev team. In preparation for providing participant level cohort search of OpenNeuro datasets in Neurobagel, we've made some changes and updates to our tools namely the query tool and would love to hear your feedback. Please check out the query tool and give us feedback in the feedback issue and let us know what you think.

surchs commented 11 months ago

Hey folks, it's already been quite a long while again, we have several updates, and I think it'd be good to discuss the next steps.

Our main update is on the search, so I'll do that first and then do @effigies original points in order and propose some next steps for each:

Search of NeuroBagel annotations of OpenNeuro datasets.

We're ready for you to link to a participant-level search of OpenNeuro datasets. :tada: Here is the link: https://query.neurobagel.org/?node=OpenNeuro.

our query tool now points to a federation API (https://federate.neurobagel.org/docs) that federates over several nodes, including the one we're hosting with data on OpenNeuro
users can select which node(s) to federate to, but the above link will always only point at OpenNeuro
the data in the OpenNeuro node is coming from a new git-org we created with @yarikoptic (https://github.com/OpenNeuroDatasets-JSONLD) to store forks of OpenNeuro datasets where we can add annotations for now until they are fully BIDS compliant.

Next steps:

try out the query tool, let us know of any features you'd like to see
create a link from OpenNeuro to https://query.neurobagel.org/?node=OpenNeuro so users can search at the participant-level as well

Let us know if that looks good to you, getting this connection going is the main point we want to discuss with you.

Annotate Poldrack-lab owned dataset(s) with NeuroBagel annotations and release, for testing.

This is ready to go as well. We have updated our browser annotator: https://annotate.neurobagel.org/, it now makes valid data dictionaries you can use in the graph

here are some docs on how to do just that: https://neurobagel.org/data_prep/
the output of this process goes into https://github.com/OpenNeuroDatasets-JSONLD
we currently planning to make the process from annotation->graph automatic: https://github.com/OpenNeuroDatasets-JSONLD/.github/issues/17

The output data dictionary hasn't been updated yet to make use of https://github.com/bids-standard/bids-specification/pull/1603 to add TermURLs for the Levels. That's a next step for us, so if you try out the annotation we'd appreciate some feedback on the output format - there is quite a bit in there that goes beyond the Levels-TermURL and we'd like to find a format that can be useful for you as well.

try an annotation on an internal dataset (and process it into a graph file if you like: https://neurobagel.org/cli/)
discuss / feedback on data dictionary format

Direct uploaders to NeuroBagel to annotate

We don't have an update on this, but one option for now would be to tell folks to add annotations for existing datasets directly through https://github.com/OpenNeuroDatasets-JSONLD and then they will become searchable.

sketch out a process together by which any user / the owner of the dataset can retroactively add annotations so they become searchable

Begin proposal to make annotations maximally encodable in BIDS

We now have https://github.com/bids-standard/bids-specification/pull/1603 to allow TermURLs in Levels and will start using this in our annotations output. There is still other information outside "Levels" that we need a place for like "isAbout" and "isPartOf" (see https://neurobagel.org/dictionaries/#assessment-tool), we'll start focusing on these early next year - any ideas appreciated.

make a list of things we can't put directly into the participants.json yet, and discuss ideas in https://github.com/bids-standard/bids-specification

Add annotation widget when ready

The https://annotate.neurobagel.org/ tool is ready and up to date with our data model, but doesn't yet output pure BIDS data dictionaries. I don't think it makes sense to use it yet for data that goes on OpenNeuro directly. It could make sense though to provide a link to it and explain that annotations go into https://github.com/OpenNeuroDatasets-JSONLD, so that users can augment those until the annotations are fully BIDS.

discuss what a OpenNeuro data owner should do / be shown when they want to augment annotation of their own data

Let me know if these point and especially the next steps make sense to you all. Also happy to chat in person if that works better. Until soon!

effigies commented 11 months ago

Hi @surchs, thanks for the post.

Search: Opened #2956 as a specific feature request for a link out. Feel free to add details there.
Annotate: Are there any datasets you would like to see annotated first? Otherwise I'll start with ds000001.
Prompt users to annotate: Possibly we could include a link to "Help annotate datasets" with the #2956 text?
BIDS: I'm already trying to keep up with y'all on this metadata. Happy to chat about ideas, but I'm following your lead...
Annotation widget: Open to ideas here, as well. An annotations section or tab would allow us to go beyond just this annotation tool. It would be very useful to have a way of quantifying the completeness of an annotation, so we could congratulate well-annotated datasets and prompt less well-annotated datasets.

Notes:

1) Setting a minimum age in the query tool produces an "Oops, something went wrong" response.

surchs commented 11 months ago

Great, thanks for the links and the bug report! Will reply directly on #2956 for search / annotate.

Annotate: Are there any datasets you would like to see annotated first? Otherwise I'll start with ds000001.

We already have ds000001 in https://github.com/orgs/OpenNeuroDatasets-JSONLD/repositories, so anything that isn't in there and you know a bit would be a good start. We're working on making the "I made an annotation now what" process more automatic (see https://github.com/OpenNeuroDatasets-JSONLD/.github/issues/17), but for now our data dictionaries get dumped into https://github.com/neurobagel/openneuro-annotations and then we process them from there. So you could pick any dataset you don't see in there and try out the annotation (https://neurobagel.org/annotation_tool/)

BIDS: I'm already trying to keep up with y'all on this metadata. Happy to chat about ideas, but I'm following your lead...

Alright, fair. We're going to have a chat with Eric and Sam on the BEP36 and we can talk about the peaceful coexistence of BIDS.json and neurobagel.json there.

Annotation widget

Cool, I like the idea of "completeness" feedback! Maybe @rmanaem and @alyssadai have some thoughts on this too.

OpenNeuroOrg / openneuro

NeuroBagel roadmap #2807

Search of NeuroBagel annotations of OpenNeuro datasets.

Annotate Poldrack-lab owned dataset(s) with NeuroBagel annotations and release, for testing.

Direct uploaders to NeuroBagel to annotate

Begin proposal to make annotations maximally encodable in BIDS

Add annotation widget when ready