dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 12 forks source link

Dandisets "hints" and automated "tips" #1816

Closed yarikoptic closed 8 months ago

yarikoptic commented 10 months ago

Inspired by https://github.com/dandi/dandi-archive/issues/860#issuecomment-1505416140 where I ended up just now with the same idea as @bendichter to link to our notebooks on https://github.com/dandi/example-notebooks/ since ATM we require manual annotation of metadata AFTER accepting notebooks to be added to example notebooks -- only few folks would do that.
Also it might be a chicken/egg problem since notebooks might/better be toward specific released version of a dandiset, but then we could need to associate them only with draft/later one I guess.

To attest to that nobody would annotate - did quick check and apparently only 000004 has it and none other reference them ```shell dandi@drogon:~/proj/dandi/example-notebooks$ for ds in 000*; do meta=$(curl --silent -X 'GET' -H 'accept: application/json' "https://api.dandiarchive.org/api/dandisets/$ds/versions/draft/" | jq .); echo $ds; find $ds -iname *ipynb | while read ipynb; do echo $meta | jq . |grep $ipynb || echo " $ipynb -- missing"; done; done 000004 "url": "https://hub.dandiarchive.org/hub/user-redirect/lab/tree/dandi-notebooks/000004/RutishauserLab/000004_demo_analysis.ipynb", 000005 000005/DataJoint/DJ-NWB-Yu-Gutnisky-2016/notebooks/Yu-Gutnisky-2016-examples.ipynb -- missing 000006 000006/DataJoint/DJ-NWB-Economo-2018/notebooks/Economo-2018-examples.ipynb -- missing 000007 000007/DataJoint/DJ-NWB-Gao-2018/notebooks/erd.ipynb -- missing 000007/DataJoint/DJ-NWB-Gao-2018/notebooks/Gao-2018-examples.ipynb -- missing 000009 000009/DataJoint/DJ-NWB-Guo-Inagaki-2017/notebooks/Guo-Inagaki-2017-examples.ipynb -- missing 000009/DataJoint/DJ-NWB-Guo-Inagaki-2017/notebooks/Guo-Inagaki-2017-NWB-examples.ipynb -- missing 000010 000010/DataJoint/DJ-NWB-Li-Daie-2015-2016/notebooks/Li-Daie-2016-examples.ipynb -- missing 000010/DataJoint/DJ-NWB-Li-Daie-2015-2016/notebooks/Li-2015-examples.ipynb -- missing 000010/DataJoint/DJ-NWB-Li-2015b/notebooks/Li-2015b-examples.ipynb -- missing 000010/DataJoint/DJ-NWB-Li-2015b/notebooks/Schemas.ipynb -- missing 000013 000013/DataJoint/DJ-NWB-Hires-Gutnisky-2015/notebooks/Hires-Gutnisky-2015-examples.ipynb -- missing 000015 000015/DataJoint/DJ-NWB-Chen-2017/notebooks/Chen-2017-examples.ipynb -- missing 000015/DataJoint/DJ-NWB-Chen-2017/notebooks/Schemas.ipynb -- missing 000039 000039/AllenInstitute/Contrast_analysis.ipynb -- missing 000039/AllenInstitute/Create_manifest.ipynb -- missing 000055 000055/BruntonLab/peterson21/Table_coarse_labels.ipynb -- missing 000055/BruntonLab/peterson21/dashboard.ipynb -- missing 000055/BruntonLab/peterson21/Fig_pow_spectra.ipynb -- missing 000055/BruntonLab/peterson21/Table_part_characteristics.ipynb -- missing 000055/BruntonLab/peterson21/Fig_coarse_labels.ipynb -- missing 000108 000108/chunglab/demo/dashboard.ipynb -- missing 000108/chunglab/demo/2021-09-27_dandi-demo.ipynb -- missing 000108/chunglab/demo/validate_lev6.ipynb -- missing 000402 000402/MICrONS/demo/000402_microns_demo.ipynb -- missing 000409 000409/IBL/03_analysis_Imbizo_2023.ipynb -- missing 000409/IBL/01_list_datasets.ipynb -- missing 000409/IBL/02_behaviour_psychometric_curve.ipynb -- missing dandi@drogon:~/proj/dandi/example-notebooks$ for ds in 000*; do meta=$(curl --silent -X 'GET' -H 'accept: application/json' "https://api.dandiarchive.org/api/dandisets/$ds/versions/draft/" | jq .); echo $ds; echo $meta | jq . | grep ipynb; done 000004 "url": "https://hub.dandiarchive.org/hub/user-redirect/lab/tree/dandi-notebooks/000004/RutishauserLab/000004_demo_analysis.ipynb", 000005 000006 000007 000009 000010 000013 000015 000039 000055 000108 000402 000409 ```

So, I feel that in addition to "validation errors" we might want to add some "recommendation engine". We could run it along side with validation. Many of those could be accompanied with specific actions to perform, e.g. in this case -- to automatically add reference such as in 000004 to the ipynb. So could be

bendichter commented 10 months ago

@yarikoptic in this case, is this something we want to leave up to the user? For example, consider the case when another user creates a notebooks associated with a dandiset. In that case, the link to this notebook would only ever be added as metadata if the dandiset author came back to the DLP draft page and accepted this suggestion. IMO, the DLP should just automatically list all associated notebooks. I think that would be a better user experience than asking users to semi-automatically add it as associated metadata.

yarikoptic commented 10 months ago

In general - I love automations. But here I wonder if we should try to maintain the feeling of owners being in charge of their dandisets and not augment some things fully automatically? So may be if we had a clear separation from some "community contributions" or "Found used in" somewhere on DLP but not within metadata record - that would make that cleaner.

bendichter commented 10 months ago

Drawing an analogy to social media websites, if you go to someone's profile on Instagram, Facebook, etc. you see all of their posts, and you often also see something like "recommended profiles", which shows you a list of other users that you may also be interested in. It is communicated through the visual design of the page that this is not controllable by the profile's user, it is just populated automatically. One question here is whether we want these types of features in DANDI or whether we want the page to be entirely controlled by the user.

My preference would be for us to follow the example of these popular networks and draw links between DANDI content automatically, making it clear with visual design that this is automatically populated, but I could see the argument for letting users control their DLPs completely.

waxlamp commented 9 months ago

Question: how do you discover notebooks that reference a given Dandiset? Is it just textual occurrence of a Dandiset/asset URL?

CodyCBakerPhD commented 9 months ago

Question: how do you discover notebooks that reference a given Dandiset? Is it just textual occurrence of a Dandiset/asset URL?

The easiest connections to find would be from the explicit subfolder nesting found on https://github.com/dandi/example-notebooks/ (one-to-one correspondence from notebooks to the dandisets they reference)

An expanded search could potentially look for occurences of a DandiApiClient making get_dandiset calls in code using a particular dandiset_id; or, as you say, any occurence of an explicit asset URL is also a good sign of a link

A greatly expanded and probably more error prone reference search could look for code/cell comments for potential verbal references to the six-digit codes

bendichter commented 9 months ago

The easiest connections to find would be from the explicit subfolder nesting found on https://github.com/dandi/example-notebooks/ (one-to-one correspondence from notebooks to the dandisets they reference)

For example, everything under example-notebooks/000004 relates to Dandiset 000004, everything under example-notebooks/000005 relates to Dandiset 00005, etc. Let's start with this approach, as this is the official recommended way to associate a notebook with a Dandiset.