datalad / datalad-catalog

Create a user-friendly data catalog from structured metadata
https://datalad-catalog.netlify.app
MIT License
14 stars 12 forks source link

ENH: update everything Binder #302

Closed jsheunis closed 1 month ago

jsheunis commented 1 year ago
adswa commented 1 year ago

I've been trying to help here, but I can't find the notebook. :/

jsheunis commented 1 year ago

Sorry, it is quite obscure and I didn't do anything to shed light on the situation.

The parameter-test comes from a branch on the datalad-binder repo: https://github.com/datalad/datalad-binder/tree/parameter-test, that is set up so that it can work with URL parameters. This should ideally just be merged into main, although we need t check what's in main to be sure we keep all the important commits.

The notebook that's used is here: https://github.com/jsheunis/datalad-notebooks/blob/main/download_data_with_datalad_python.ipynb. This one should probably be the bash notebook and not the python. A bash notebook exists, and I'm not sure why that hasn't been used, it could be that I didn't know yet about the bash kernel when this whole pipeline was set up, or that the jupyter-params extension didn't play nicely with the bash kernel. Need to check. Also, we can probably move the notebook to somewhere in the datalad world, possibly the tutorials repo: https://github.com/datalad/tutorials

This is the code that runs in the browser when selecting "Explore with Binder":

openWithBinder(dataset_url) {
            const environment_url =
              "https://mybinder.org/v2/gh/datalad/datalad-binder/parameter-test";
            const content_url = "https://github.com/jsheunis/datalad-notebooks";
            const content_repo_name = "datalad-notebooks";
            const notebook_name = "download_data_with_datalad_python.ipynb";
            binder_url =
              environment_url +
              "?urlpath=git-pull%3Frepo%3D" +
              content_url +
              "%26urlpath%3Dnotebooks%252F" +
              content_repo_name +
              "%252F" +
              notebook_name +
              "%3Frepourl%3D%22" +
              dataset_url +
              "%22";
            window.open(binder_url);
          },

This would need to be updated to point to the correct main branch (once parameter-test is merged) and the correct notebook repo and name (once notebook is moved).

jsheunis commented 1 year ago

Test for new setup:

Datalad dataset: https://github.com/psychoinformatics-de/studyforrest-data.git

Binder-repo: https://mybinder.org/v2/gh/datalad/datalad-binder/HEAD

Notebook repo: https://github.com/datalad/tutorials

Notebook name: notebooks/binder/download_data_with_datalad.ipynb

Seems to work, just need to edit the notebook a bit.

jsheunis commented 1 year ago

Main problem is that jupyter-notebookparams sets the parameter cell content as:

repo_url = "<dataset-url>"

instead of

repo_url="<dataset-url>"

i.e. there as spaces, where there should be none in order for bash to recognize repourl as a variable. I'm guessing this happens internally in the extension. The HTML URL encoding that's used in catalog to add the parameters is as follows:

"%3Frepourl%3D%22" + dataset_url  + "%22";

where: %3F -> ? %3D -> = %22 -> "

therefore: ?repourl="<dataset-url>"

i.e. there aren't any explicit spaces encoded.