Closed arjunsrini closed 10 months ago
Thanks @arjunsrini. This all looks great to me.
(1) Agree on sticking to venv
for Python. For Stata, our approach has been to literally put the dependencies in the repository (in \lib\
) and only allow Stata to call commands from there. For R, I'd be open to renv
; my only concern would be whether this is good / widely adopted / stable enough that we can count on it going forward. I think we generally want to stick to things that people would think of as standard practice.
(2) OK. Let's drop make_lib.sh
(3) OK. Let's use those environment variables. I guess then config.sh
replaces config.yaml
and each make.sh
script just runs config.sh
?
(4) OK.
(5) That sounds fine. I like links too, but in this case we want to preserve the fact that the .tex
or .lyx
files in paper_slides
can be compiled on a clean clone of the repo without running make.sh
. For that reason we need the input files to actually be committed to Git. It would seem like in principle we could commit the symlinks, but I think we checked at some point and found it didn't work.
(6) OK.
Hi Professor @gentzkow! With Arjun's help, I have addressed the points discussed above in this arjunsrini/TunaTemplate/issues/7. I have submitted the first pull request with the changes and will keep you updated when all the changes are approved and merged into TunaTemplate. Thank you to the team for their support so far!
Sorry for the delay here — @shrishj made the edits requested above and I reviewed + merged those to TunaTemplate
main
. It looks like related work is occurring on #96. @gentzkow Are there specific next steps you’d like us to work on? :-)
@arjunsrini I think we can close this out given the work in GentzkowLabTemplate
? Let me know if you agree, thanks!
@snairdesai yes, that sounds great!
In this issue, @arjunsrini and @shrishj began to plan the transition of template
to a bash
architecture. Work continues in the new repository GentzkowLabTemplate
.
The goal of this issue is to consider and potentially implement simplifications to the shell version of this template suggested in @gentzkow’s post here. Specifically (bullets changed to numbers by me):
Here are my thoughts:
As a reminder, we are currently handling (python) dependencies with a
requirements.txt
file and standard pythonvenv
. The environment is created/activated using shell functions called inside eachmake.sh
script. The other lightweight options that come to mind are:A. handling dependencies in each language with a language-specific setup script (e.g.
setup.py
) B. directly includingpip install [insert-package]
lines in a setup shell script C. includingpip install -r requirements.txt
in a setup shell script without a virtual environmentI think our current approach is the best among these options. We should keep the virtual environment because without it, python packages are installed globally which can result in dependency conflicts between projects. We should activate the virtual environment from the shell makefile (as we currently do) because this is clear/standard/straightforward. Using a script like
setup.py
to manage environments could cause confusion, especially if the user currently has another environment active.(D.) For Stata and R packages, my understanding is we are currently managing dependencies with language-specific setup scripts (@snairdesai @ShiqiYang2022 @jc-cisneros is this right?). This seems inevitable in Stata. 🤷 I try to avoid using R (perhaps the GS lab predocs have a stronger opinion here 🙂), but maybe we should start using
renv
? Regardless,renv
would still be a part of a language-specific setup script and doesn’t need to be a part of the shell template.Initially, I created the
make_lib.sh
script so shell functions could be easily sourced inside a single line of theMakefile
, but this is no longer necessary since we are usingmake.sh
shell scripts. The main advantage of keeping amake_lib.sh
is to keep organized the library of shell functions we define for this template. I think we could indeed just do a singleconfig.sh
script instead. I’m slightly in favor of getting rid of themake_lib.sh
layer.For
make_externals.sh
, I like the idea of just using environment variables. I think we could do this insideconfig.sh
with theexport
command (e.g.export DROPBOX_DIR=/home/username/dropbox/
). Inside a python script, this can be accessed with:The Stata and R equivalents are:
and
For
run_programs_in_order
, I agree it makes sense to just callrun_stata
,run_python
, etc. inside themake.sh
script (get rid ofrun_programs_in_order
). I think we should keep the shell functions because they wrap a few extra steps (handling auto-generated log files for Stata, forrun_latex
callingbibtex
and thenpdflatex
again, cleanup ofpdflatex
artifacts).For input/output files for
paper_slides
, I think the inputs could be handled by including a recursive copy command (cp -r ../analysis/output input
) or, better yet, a symbolic link command (ln -s ../analysis/output input
) inpaper_slides/make.sh
. The latter will be faster. The output can be automatically moved fromcode
tooutput
by adding a line to therun_latex
command.Remove the
Makefile
s as they are superseded bymake.sh
scripts.Let me know what y’all think! 🦃
cc @gentzkow @shrishj @snairdesai @ShiqiYang2022 @jc-cisneros