gentzkow / template_archive

20 stars 36 forks source link

PR for #18 (RA Manual): Review Sherlock Extension on Practice Task #61

Closed snairdesai closed 2 years ago

snairdesai commented 2 years ago

@szahedian @jc-cisneros This pull request closes Issue #18 in gslab-econ/ra-manual. We have implemented two major fixes to support users who do not wish to use conda in gentzkow/template:

  1. Creating an example markdown file (setup/dependencies.md) which provides non-conda users with instructions on how to manually install dependencies, both locally and on Sherlock.
  2. Revising setup/check_setup.py to (a) include a warning message for non-conda users, and (b) to robustly check if the user has installed all relevant package dependencies.

You may find this comment helpful for additional context.

szahedian commented 2 years ago

@snairdesai @jc-cisneros this looks great! A few more comments:

snairdesai commented 2 years ago

@szahedian Thanks for the comments!

Sorry if this is a naive question from a Sherlock never-user, but is loading modules through Sherlock something all Sherlock users will know how to do? If not, maybe include a link to a how-to page regarding this.

Absolutely agree - more context here is in Issue #18 in gslab-econ/lab-manual, but we are adding a distinct, publicly-facing section in template (separate from the practice task) with a How To for Sherlock users which will include this additional support. The first step in the practice task will be to read this section. We plan on integrating these changes as part of #62. If you're interested, we have a draft version of this here.

  • Now that we now allow users to bypass conda, anywhere we mention conda we should probably mention that manual install is an option. The first place that comes to mind is the README. No need to explain everything, but just mention that users can bypass conda using the instructions in dependencies.md

Same comment here as above - these flags will be provided for the user in the revised template instructions we create in #62.

  • Under point (2) in dependencies.md, you mention the need for pip and R. You mention Homebrew. Maybe include a link to that?

This link to homebrew and installation instructions will also be included in the revised template in #62.

@jc-cisneros and I will work on revising the check_setup.py to include warnings which list all Python and R dependencies a conda user might be missing.

snairdesai commented 2 years ago

@szahedian Regarding your comment above:

I see we are listing precisely which dependencies are missing for R, but can we do this for Python as well? And can we list all missing dependencies, not just the first (which might happen if all dependencies in checked within a single try-except block)?

This is a great idea, we've made the relevant updates to check_setup.py in commits 477719b up to 6b92576. Most of the changes are contained in the check_dependencies() function, although the final three commits relate to some formatting changes which we hope to standardize in conda_env.yaml.

You can test it by running the standard template process and uncommenting pendulum and MoviePy in lines 14-15 of conda_env.yaml, as well as r-nicheROVER in line 20.

If this looks good, I'll delete these lines in conda_env.yaml (I only added them so you could test the code yourself) and merge back to master. Thanks again!

snairdesai commented 2 years ago

@szahedian Just checking in here to see if we are good to merge this back to master and close this PR out?

szahedian commented 2 years ago

@snairdesai the code now throws an error when Python packages in conda_env.yaml aren't installed to the user's local directory. This is great!

Something I noticed is that, if there are Python packages missing, the error mentions these only. If all Python dependencies are met but some R dependencies aren't, only then does the code mention that some R dependencies are missing too. It's a small point and totally your call whether you think that should be changed. Otherwise, everything looks good to me!

snairdesai commented 2 years ago

@szahedian Thanks for this comment and for your review! In regards to the below:

...if there are Python packages missing, the error mentions these only. If all Python dependencies are met but some R dependencies aren't, only then does the code mention that some R dependencies are missing too.

The code here is intentionally structured based on the flow of our conda_env.yaml files. First, conda forge attempts to install the Python package dependencies, which our first unit test checks. Only if all of these requirements are met does conda forge move on to the R package dependencies.

It is possible a user will have (i) the wrong Python packages and (ii) the wrong R packages. They will need to re-build the environment twice to correct these; as they need to fix (i) before they are notified of (ii).

I think this is acceptable for a couple of reasons: (1) we explicitly (and repeatedly) warn users of the need to check conflicts across both applications prior to building their environment, and list of all these packages (and their versions) in conda_env.yaml; and (2) even if users ignore this advice, rebuilding the environment a second time takes substantially less time, now that we impose the conda config --set channel_priority strict feature.

Based on the above, @jc-cisneros and I will work on resolving final merge conflicts and closing this PR. Thanks again!