gentzkow / template_archive

20 stars 36 forks source link

Simplify shell template #95

Closed arjunsrini closed 10 months ago

arjunsrini commented 1 year ago

The goal of this issue is to consider and potentially implement simplifications to the shell version of this template suggested in @gentzkow’s post here. Specifically (bullets changed to numbers by me):

  1. How we are handling dependencies + the virtual environment
  2. Do we need the make_lib.sh layer? Could we replace config yaml altogether with just a config .sh script?
  3. Do we need the make_externals.sh layer? Could we replace this with just defining paths to the externals in the config file and then referring to those paths directly in the downstream scripts? (To do this we just need a way to pass those global variables forward to R/Python/Stata.)
  4. What are the pros/cons of using run_programs_in_order as a single command vs. calling individual run_stata, run_python, etc. commands in make.sh, or even calling the shell commands to run these scripts directly in make.sh?
  5. How can we improve the handling of input/output files in paper_slides. The "For now: manually copy" and "For now: manually move" steps are obviously clunky.

Here are my thoughts:

  1. As a reminder, we are currently handling (python) dependencies with a requirements.txt file and standard python venv. The environment is created/activated using shell functions called inside each make.sh script. The other lightweight options that come to mind are:

    A. handling dependencies in each language with a language-specific setup script (e.g. setup.py) B. directly including pip install [insert-package] lines in a setup shell script C. including pip install -r requirements.txt in a setup shell script without a virtual environment

    I think our current approach is the best among these options. We should keep the virtual environment because without it, python packages are installed globally which can result in dependency conflicts between projects. We should activate the virtual environment from the shell makefile (as we currently do) because this is clear/standard/straightforward. Using a script like setup.py to manage environments could cause confusion, especially if the user currently has another environment active.

    (D.) For Stata and R packages, my understanding is we are currently managing dependencies with language-specific setup scripts (@snairdesai @ShiqiYang2022 @jc-cisneros is this right?). This seems inevitable in Stata. 🤷 I try to avoid using R (perhaps the GS lab predocs have a stronger opinion here 🙂), but maybe we should start using renv? Regardless, renv would still be a part of a language-specific setup script and doesn’t need to be a part of the shell template.

  2. Initially, I created the make_lib.sh script so shell functions could be easily sourced inside a single line of the Makefile, but this is no longer necessary since we are using make.sh shell scripts. The main advantage of keeping a make_lib.sh is to keep organized the library of shell functions we define for this template. I think we could indeed just do a single config.sh script instead. I’m slightly in favor of getting rid of the make_lib.sh layer.

  3. For make_externals.sh, I like the idea of just using environment variables. I think we could do this inside config.sh with the export command (e.g. export DROPBOX_DIR=/home/username/dropbox/). Inside a python script, this can be accessed with:

    import os
    dropbox_dir = os.getenv('DROPBOX_DIR')

    The Stata and R equivalents are:

    local dropbox_dir: getenv DROPBOX_DIR

    and

    dropbox_dir <- Sys.getenv("DROPBOX_DIR")
  4. For run_programs_in_order, I agree it makes sense to just call run_stata, run_python, etc. inside the make.sh script (get rid of run_programs_in_order). I think we should keep the shell functions because they wrap a few extra steps (handling auto-generated log files for Stata, for run_latex calling bibtex and then pdflatex again, cleanup of pdflatex artifacts).

  5. For input/output files for paper_slides, I think the inputs could be handled by including a recursive copy command (cp -r ../analysis/output input) or, better yet, a symbolic link command (ln -s ../analysis/output input) in paper_slides/make.sh. The latter will be faster. The output can be automatically moved from code to output by adding a line to the run_latex command.

  6. Remove the Makefiles as they are superseded by make.sh scripts.

Let me know what y’all think! 🦃

cc @gentzkow @shrishj @snairdesai @ShiqiYang2022 @jc-cisneros

gentzkow commented 1 year ago

Thanks @arjunsrini. This all looks great to me.

(1) Agree on sticking to venv for Python. For Stata, our approach has been to literally put the dependencies in the repository (in \lib\) and only allow Stata to call commands from there. For R, I'd be open to renv; my only concern would be whether this is good / widely adopted / stable enough that we can count on it going forward. I think we generally want to stick to things that people would think of as standard practice.

(2) OK. Let's drop make_lib.sh

(3) OK. Let's use those environment variables. I guess then config.sh replaces config.yaml and each make.sh script just runs config.sh?

(4) OK.

(5) That sounds fine. I like links too, but in this case we want to preserve the fact that the .tex or .lyx files in paper_slides can be compiled on a clean clone of the repo without running make.sh. For that reason we need the input files to actually be committed to Git. It would seem like in principle we could commit the symlinks, but I think we checked at some point and found it didn't work.

(6) OK.

shrishj commented 1 year ago

Hi Professor @gentzkow! With Arjun's help, I have addressed the points discussed above in this arjunsrini/TunaTemplate/issues/7. I have submitted the first pull request with the changes and will keep you updated when all the changes are approved and merged into TunaTemplate. Thank you to the team for their support so far!

arjunsrini commented 10 months ago

Sorry for the delay here — @shrishj made the edits requested above and I reviewed + merged those to TunaTemplate main. It looks like related work is occurring on #96. @gentzkow Are there specific next steps you’d like us to work on? :-)

snairdesai commented 10 months ago

@arjunsrini I think we can close this out given the work in GentzkowLabTemplate? Let me know if you agree, thanks!

arjunsrini commented 10 months ago

@snairdesai yes, that sounds great!

snairdesai commented 10 months ago

Summary + Deliverables


In this issue, @arjunsrini and @shrishj began to plan the transition of template to a bash architecture. Work continues in the new repository GentzkowLabTemplate.