gentzkow / template_archive

20 stars 36 forks source link

Compile a list of functionalities of current template structures #94

Closed ShiqiYang2022 closed 1 year ago

ShiqiYang2022 commented 1 year ago

The goal of this issue is to draft a list of functionalities (e.g., moving files across modules or logging runs) that template/gslab_make and Arjun's shell template solution currently have. To do this, @snairdesai @jc-cisneros and I will help @shrishj get familiar with:

while @arjunsrini Arjun will help (thanks for helping here!) @shrishj with

The deliverable is a list of functionalities that we have in template/gslab_make but not in Arjun's shell template and vice versa. @shrishj, please reach out to us at any time if you have questions, thanks!

shrishj commented 1 year ago

Hi Professor @gentzkow! @snairdesai, @arjunsrini, @jc-cisneros, @ShiqiYang2022 and I had a meeting on Friday to discuss what was missing in "shell make" and compiled proposed next steps for your review. We collaborated using this Google Doc.

Add functionality for other programs (presently only available for Stata and Shell files)

Convert Makefiles to make.sh

Creating/filling tables

Copy inputs and externals

Config

User interface

gentzkow commented 1 year ago

@shrishj Thanks!

Let me jump in to make some comments:

  1. I think our approach here should be to converge on a minimal implementation of the shell template that we are happy with. To that end, I'd suggest we just aim to have a shell that can run Stata, Python, and R and also compile Latex. We can skip the tablefill functionality, creating links, and config steps for now.

  2. Before you do a ton of work implementing this, I think we should all look together at a "rough draft" of it to make sure we're all on the same page about what we want to implement. I'm happy to do that however seems most efficient -- could be a first-cut implementation in actual code, or a mockup of it in pseudocode.

@snairdesai @arjunsrini @jc-cisneros @ShiqiYang2022: I'd appreciate your eyes on this too. Remember that the goal is not to do a 1:1 translation of our current template into shell. The goal is to simplify as much as possible along the way, and to implement the changes we discussed at our meeting.

arjunsrini commented 1 year ago

I’ve updated TunaTemplate to include a "rough draft" working stencil. It runs the same python scripts as normal template and compiles a similar tex file. Basic versions of run Stata, Python, R, and Latex are implemented.

@snairdesai @jc-cisneros @ShiqiYang2022 @shrishj let me know if this looks right to y'all; perhaps @shrishj can clone the repo and try to replicate it. There are instruction for how to do this in the TunaTemplate readme.

The run_program shell functions are defined here. I’ve included the terminal output of my cloning/running of TunaTemplate below.

My replication of shell template (TunaTemplate) ```sh $ git clone https://github.com/arjunsrini/TunaTemplate Cloning into 'TunaTemplate'... remote: Enumerating objects: 95, done. remote: Counting objects: 100% (95/95), done. remote: Compressing objects: 100% (51/51), done. remote: Total 95 (delta 28), reused 86 (delta 23), pack-reused 0 Receiving objects: 100% (95/95), 14.73 KiB | 2.95 MiB/s, done. Resolving deltas: 100% (28/28), done. Filtering content: 100% (4/4), 8.50 MiB | 3.85 MiB/s, done. $ cd TunaTemplate $ git submodule init Submodule 'lib/shmake' (https://github.com/arjunsrini/shmake.git) registered for path 'lib/shmake' $ git submodule update Cloning into '/Users/arjunsrinivasan/Documents/scratch/TunaTemplate/lib/shmake'... Submodule path 'lib/shmake': checked out 'c6c18a630ca63b488d84528f2cfd4575af39c05a' $ bash make.sh Making data module with shell: /bin/zsh Running: merge_data.py Running: clean_data.py Making analysis module with shell: /bin/zsh Running: analyze_data.py Making paper_slides module with shell: /bin/zsh Running: paper.tex ```
ShiqiYang2022 commented 1 year ago

@arjunsrini Thanks arjun! I think your proposed run_program function looks great!

I played with the Template, and I just found that I cannot run it without loading the gentzkow/template environment, my error message is as follows.

```bash SIEPR-C02G50GUML86:github_folders shiqiyang$ git clone https://github.com/arjunsrini/TunaTemplate Cloning into 'TunaTemplate'... remote: Enumerating objects: 95, done. remote: Counting objects: 100% (95/95), done. remote: Compressing objects: 100% (51/51), done. remote: Total 95 (delta 28), reused 86 (delta 23), pack-reused 0 Receiving objects: 100% (95/95), 14.73 KiB | 2.45 MiB/s, done. Resolving deltas: 100% (28/28), done. Filtering content: 100% (4/4), 8.50 MiB | 2.81 MiB/s, done. SIEPR-C02G50GUML86:github_folders shiqiyang$ cd TunaTemplate SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ git submodule init Submodule 'lib/shmake' (https://github.com/arjunsrini/shmake.git) registered for path 'lib/shmake' SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ git submodule update Cloning into '/Users/shiqiyang/Documents/github_folders/TunaTemplate/lib/shmake'... Submodule path 'lib/shmake': checked out 'c6c18a630ca63b488d84528f2cfd4575af39c05a' SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ bash make.sh Making data module with shell: /bin/bash Running: merge_data.py ../lib/shmake/lib.sh: line 97: python: command not found Running: clean_data.py ../lib/shmake/lib.sh: line 97: python: command not found Making analysis module with shell: /bin/bash cp: ../data/output/data_cleaned.csv: No such file or directory Running: analyze_data.py ../lib/shmake/lib.sh: line 97: python: command not found Making paper_slides module with shell: /bin/bash cp: ../data/output/chips_sold.pdf: No such file or directory Running: paper.tex mv: rename ./code/paper.pdf to ./output/paper.pdf: No such file or directory SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ which python SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ which python3 /usr/local/bin/python3 SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ conda activate template (template) SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ bash make.sh Making data module with shell: /bin/bash Running: merge_data.py Running: clean_data.py Making analysis module with shell: /bin/bash Running: analyze_data.py Making paper_slides module with shell: /bin/bash Running: paper.tex (template) SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ conda deactivate SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ bash make.sh Making data module with shell: /bin/bash Running: merge_data.py ../lib/shmake/lib.sh: line 97: python: command not found Running: clean_data.py ../lib/shmake/lib.sh: line 97: python: command not found Making analysis module with shell: /bin/bash cp: ../data/output/data_cleaned.csv: No such file or directory Running: analyze_data.py ../lib/shmake/lib.sh: line 97: python: command not found Making paper_slides module with shell: /bin/bash cp: ../data/output/chips_sold.pdf: No such file or directory Running: paper.tex mv: rename ./code/paper.pdf to ./output/paper.pdf: No such file or directory SIEPR-C02G50GUML86:TunaTemplate shiqiyang$ ```
arjunsrini commented 1 year ago

@ShiqiYang2022 Thanks for testing it!

The error you ran into arises because in your default shell environment (before activating Conda), the PATH environment variable does not include a directory that contains a python executable. When you run which python, it finds no matching executable in any of the directories listed in your PATH, resulting in no output.

When you activate a Conda environment, several environment variables, including PATH, are temporarily modified for your current shell session. Conda prepends its own directories to the PATH. These directories contain executables for Python and other tools installed in the Conda environment.

My prior implementation of run_python assumes that your terminal recognizes the command python. I’ve now added functionality to parse a config.yaml file for a python command which is then used to run python programs (so you could specify python3 as your python command in a config.yaml file at the root of your project if you wanted). But I think it is ok for us to by default assume that if someone includes a python program, the command they’d like to execute it with is python.

Let me know if others have comments or run into bugs @snairdesai @jc-cisneros @shrishj

shrishj commented 1 year ago

Hi @arjunsrini! Thanks for clarification - I encountered a similar issue to Shiqi. I have copied in the .yaml file, but do I need to change the pathToRepo and the pathToDb values too so that the program runs correctly. I seem to be getting an error with pandas installation. I tried the pip3 install pandas, but it didn't work. Thank you for the help!

arjunsrini commented 1 year ago

@shrishj thanks for testing it!

Re: config — you can remove the other lines and it should work. For more context, see how the config functionality is implemented here.

Re: pandas error — is this unique to your usage of template? If you are using python as your python command (the default if you don’t use config.yaml), I’d try pip install pandas (not pip3). Or alternatively, set pythonCmd: “python3” in your config.yaml.

If that doesn’t work, can you load pandas in either of these ways?

Interactive python session in terminal: ```sh $ python Python 3.8.8 (default, Apr 13 2021, 12:59:45) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> $ python3 Python 3.8.8 (default, Apr 13 2021, 12:59:45) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> $ ```

If you are using pip3 (package manager for Python 3) to install pandas, it will be installed in your Python 3 environment. You should check that your python command is linked to Python 3 (try python --version) and is associated with the same environment as your pip3. You can confirm this with the commands which pip3, which python3, which python, etc.

If installing pandas is giving you a lot of trouble feel free to put the errors you’re getting in Slack.

shrishj commented 1 year ago

Hi @arjunsrini! Thanks so much for your quick reply! It is now working and creating ~/paper_slides/output/paper.pdf. In addition to pandas, I also needed to use pip3 install for matplotlib and linearmodels.

snairdesai commented 1 year ago

@arjunsrini: Confirming this also works on my end after the relevant installs, great work (it's also noticeably faster than standard template, as we expect)! Some notes from our in-person discussion:

arjunsrini commented 1 year ago

I’ve updated TunaTemplate to include automated python virtual environment (venv) functionality using a requirements.txt file and shell functions to create and activate the environment. I also have the program quit on error and improved the log file readability for compiling latex.

cc @snairdesai @jc-cisneros @ShiqiYang2022 @shrishj

jc-cisneros commented 1 year ago

@arjunsrini confirming this works on my end! Thanks for the work! As a minor comment, we probably want to add the venv directory to the .gitignore.

@gentzkow you can test it on your end with these steps: 1) Run git clone git@github.com:arjunsrini/TunaTemplate.git and cd TunaTemplate 2) Load the submodule with git submodule init and git submodule update 3) Run the "make" shell script at the root of the repo with bash make.sh

gentzkow commented 1 year ago

@shrishj @arjunsrini @snairdesai @ShiqiYang2022 @jc-cisneros Thanks

Quick questions

  1. Why is this still in a task called "Compile a list of functionalities"? 😉
  2. What is the relationship between the Makefiles and the make.sh scripts at this point? Have we just replaced the former w/ the latter? (My view on Makefiles has been that we don't want to use them because we don't want to bother enumerating the individual targets but happy to revisit that)

My broader comment is that I'd like to stop at this point to take stock and think about potential ways to simplify. I'd like to discuss, e.g.,

I'd suggest we wrap this issue and open a new one to discuss these things.

ShiqiYang2022 commented 1 year ago

Summary + Deliverables

In this issue (https://github.com/gentzkow/template/issues/94) we got @shrishj get familiar with gentzkow template, gslab_make library and @arjunsrini's existing template constructed based on shell functions, and the corresponding shmake.

The deliverable is the compiled list of functionalities in GSLab Functionality.pdf, and proposed next steps to do in this comment. In this issue @arjunsrini also conducted a first implement of the functionalities, details referred to https://github.com/gentzkow/template/issues/94#issuecomment-1809222181 https://github.com/gentzkow/template/issues/94#issuecomment-1811555092.

This thread continued in #95 for the follow up discussion of https://github.com/gentzkow/template/issues/94#issuecomment-1819871703.