Using DataLad in Jupyter notebook

stephaniealley commented 4 years ago

@gkiar, I have a question regarding the use of DataLad within a Jupyter notebook. I can use the Python API to track the generation and/or movement of files from one directory location to another, but I cannot track commands executed on the data as I see can be done using 'datalad run'. As far as I can tell, that only applies to executing some external script. I really like the concept of sharing my code in a notebook, but I think that tracking the data manipulation is more important for my particular project. Do you happen to have any suggestions?

gkiar commented 4 years ago

Hey @stephaniealley - great question, thanks for reaching out 😄

If for your project the Provenance of what you do to your dataset is most important, but you still like the look-and-feel of notebooks, I think we can come up with a bit of a hybrid approach that gets you both. What I'd propose is this:

Write the logic of what you want to do to your dataset in your notebook and make sure it works as expected.
Once it's working, move this code to a set of scripts, and make sure that they still work as expected when being called with datalad run.
Once you've done that, you can go into your notebook and do all the movement of files or plotting or what-have-you natively in the notebook, but replace the commands-now-belonging-to-scripts with something like this:
```
! datalad run python my_script.py arg1 arg2 --flag1
```
where the args are replaced with those corresponding to your script, if applicable.

This way, you're able to still have things in a notebook, and even walk them through the commands you're running outside of the notebook (the ! at the start of the line makes the following commands run in the shell that launched your Python session, in this case your notebook).

Does that make sense/seem to suite your need?

stephaniealley commented 4 years ago

Yes, that is exactly what I need! I just wasn't exactly sure how to work it out. Thank you!

brainhack-school2020 / stephaniealley_bhs2020_project

Using DataLad in Jupyter notebook #7