SyneRBI / SIRF-Exercises

SIRF Training and demonstration material
http://www.ccpsynerbi.ac.uk
Apache License 2.0
18 stars 21 forks source link

add option to download_data.sh to specify working_dir #118

Closed paskino closed 3 years ago

paskino commented 3 years ago

Allows to specify a base directory for cd_to_working_dir. If passed nothing it will use whatever is returned by exercises_data_path

paskino commented 3 years ago

To be honest, I do not see the added value of the function cd_to_working_dir as it's use is just to change directory to a directory where the user will store temporary working data. It does it in a totally opaque way and it really just wraps os.path.join + os.makedirs, we might as well just use those in the notebooks and will not have the problem.

Also the user does not seem to know where this directory is, unless he/she reads the docstring. https://github.com/SyneRBI/SIRF-Exercises/blob/ec421f70af805da88e63fc7312c5879c4361a0ea/lib/sirf_exercises/__init__.py#L48

KrisThielemans commented 3 years ago

To be honest, I do not see the added value of the function cd_to_working_dir as it's use is just to change directory to a directory where the user will store temporary working data. It does it in a totally opaque way

fair comment! I guess the idea was that it would mean that all notebooks would be writing in one working_folder, which the user can then easily get rid of. That's not so easy to do in a generic way (I think). getcwd() has the disadvantage that all the data will end up mixed with the notebooks. That will be particularly problematic when we need to tell them to wipe their SIRF-Exercises to be able to update them... (although of course you'll say that it's now written in SIRF-Exercises/data anyway).

Easy and clean solutions are of course always best...

What about changing this to

set_working_directory(somedir) # save it somewhere
get_working_directory()
# intro notebook
# notebook
os.chdir(get_working_directory())

workdir could have some sensible default such as SIRF-Exercises/working_folder, which can be presumably be found from the module.

maybe too complicated after all...

paskino commented 3 years ago

Agreed, it's difficult to implement it in a generic way.

Anyway I've modified this PR so that download_data.sh accepts a -w WORKDIR parameter with which the user will be able to set the working directory.

Notice however that the users will have to run once the following command, either in the notebook or in the terminal.


./SIRF-Exercises/scripts/download_data.sh -w working_dir
paskino commented 3 years ago

I tried it on the jupyterhub and it seems to work fine.

ashgillman commented 3 years ago

Please see my email, I think the current code should work for your use-case Edo by having DOWNLOAD_DIR shared and EXERCISES_DATA_DIR per user.

However, I agree that being able to specify WORKING_DIR is a good idea. What is missing here though is a default argument for it.

ashgillman commented 3 years ago

To be honest, I do not see the added value of the function cd_to_working_dir as it's use is just to change directory to a directory where the user will store temporary working data. It does it in a totally opaque way and it really just wraps os.path.join + os.makedirs, we might as well just use those in the notebooks and will not have the problem.

The other option I considered, along the lines of what Kris suggested was:

os.chdir(get_working_dir())

Its very minor, but the only reason I went against that is we have a very simple little bit of code at the start of each notebook that is consistent. Happy to also import os though and do our own chdir if it is more apparent to the user, but this is unrelated?

ashgillman commented 3 years ago

In your email, Edo, you mentioned you used the Environment variable option I included. If you prefer to have an environment variable – I was originally a proponent of this idea.

To me, ideally we’d have:

Then the env vars can be set in the VM or Docker and these scripts just transparently “work” here. But anyone doing their own SuperBuild never needs to bother with the env vars.

But, again, this isn't urgent for this week if the DOWNLOAD_DIR works for you per the email

KrisThielemans commented 3 years ago

The other option I considered, along the lines of what Kris suggested was:

os.chdir(get_working_dir())

Its very minor, but the only reason I went against that is we have a very simple little bit of code at the start of each notebook that is consistent

seems easy enough to have a get_working_dir but also a cd_to_working_dir(). we can always add the get later. I don't think that's urgent at all

ashgillman commented 3 years ago

What is missing here though is a default argument for it.

I take this back. So long as download_data.sh with no -w arg indeed does nothing (I'm not 100% confident the canoncicalise function will leave the variable blank?) then the current change will fall back to the same behavior as now when no -w is given.

Have you checked that? If so this looks good to me

paskino commented 3 years ago

Still need to add working_folder to .gitignore. Otherwise, looks good to me for now.

But working_folder is not known a priori, so why should we do this? Or it's a workaround for the upcoming school?

KrisThielemans commented 3 years ago

Still need to add working_folder to .gitignore. Otherwise, looks good to me for now.

But working_folder is not known a priori, so why should we do this? Or it's a workaround for the upcoming school?

as otherwise you do a git status after the download and think that the repo has changed. Or at least, I think so. I didn't try of course.

paskino commented 3 years ago

download_data.sh would not set the working directory unless the user specifies which one it is. So there is absolutely no guarantee that it will be set in the repository directory. However, if we instruct the users to run the download script as


./SIRF-Exercises/scripts/download_data.sh -w ~/working_dir

This will mean that the working directory is outside of the repo and we don't need to anything to .gitignore.

BTW, thinking from the user perspective this is kind of awkward. I think that asking them to modify a line in the jupyter notebook is slightly easier to accept?

On the other hand, they will have to start gadgetron so, this would be just another command.

paskino commented 3 years ago

So long as download_data.sh with no -w arg indeed does nothing (I'm not 100% confident the canoncicalise function will leave the variable blank?) then the current change will fall back to the same behavior as now when no -w is given.

I'll move the canonicalise in the if statement. https://github.com/SyneRBI/SIRF-Exercises/blob/88c3044aa65820457e069e19ae676283505a2887/scripts/download_data.sh#L201-L207

KrisThielemans commented 3 years ago

I thought it'd be easiest to always write working_path.py (just like the data_path.py) (it would certainly make the implementation of a Python-side get_working_directory easier), but whatever works now is fine

paskino commented 3 years ago

Possibly, but that's not how these functions work. https://github.com/SyneRBI/SIRF-Exercises/blob/88c3044aa65820457e069e19ae676283505a2887/lib/sirf_exercises/__init__.py#L26-L31

For instance data_path.py might not be there and we try to use the environment variable SIRF_EXERCISES_DATA_PATH, which is actually the way I set it up in the jupyterhub instance.

KrisThielemans commented 3 years ago

yeah, but IF they've run download_data.sh, the file would be there.

anyway.

ashgillman commented 3 years ago

So long as download_data.sh with no -w arg indeed does nothing (I'm not 100% confident the canoncicalise function will leave the variable blank?) then the current change will fall back to the same behavior as now when no -w is given.

I'll move the canonicalise in the if statement.

I think its better to "fix" canonicalize to return an empty variable if the input is empty?

Also, would it be nicer to write variables exercise_data_path and optionally working_folder_path both to the same file, data_path.py? (or maybe more aptly, download_configurations.py) Seems like a bit less pollution of configured files.

KrisThielemans commented 3 years ago

I think that's a nice idea but we don't have time anymore. whatever works (and doesn't look too ugly from a user perspective).

KrisThielemans commented 3 years ago

merge whenever you're ready. I won't check it anymore!