Open stephaniealley opened 4 years ago
Hey @stephaniealley - great question, thanks for reaching out 😄
If for your project the Provenance of what you do to your dataset is most important, but you still like the look-and-feel of notebooks, I think we can come up with a bit of a hybrid approach that gets you both. What I'd propose is this:
datalad run
.! datalad run python my_script.py arg1 arg2 --flag1
where the args are replaced with those corresponding to your script, if applicable.
This way, you're able to still have things in a notebook, and even walk them through the commands you're running outside of the notebook (the !
at the start of the line makes the following commands run in the shell that launched your Python session, in this case your notebook).
Does that make sense/seem to suite your need?
Yes, that is exactly what I need! I just wasn't exactly sure how to work it out. Thank you!
@gkiar, I have a question regarding the use of DataLad within a Jupyter notebook. I can use the Python API to track the generation and/or movement of files from one directory location to another, but I cannot track commands executed on the data as I see can be done using 'datalad run'. As far as I can tell, that only applies to executing some external script. I really like the concept of sharing my code in a notebook, but I think that tracking the data manipulation is more important for my particular project. Do you happen to have any suggestions?