Closed gentzkow closed 6 months ago
@gentzkow, some comments on testing on Linux/Windows:
Thanks @gentzkow and thanks @jc-cisneros for very thoughtful comments! I have few points to throw in:
For Windows, we might consider utilizing remote desktop solutions or cloud services to access a Windows environment. E.g. use Microsoft Azure or Amazon EC2 for testing purposes.
I am more than happy to test the Template on my personal Windows machine. I think SIEPR might also have some idle windows machines because I heard one predoc has asked Lisa for a new windows machine to replace his old one. I will reach out to Lisa for more details next week.
Thanks! I think a Widows VM and/or a physical machine at SIEPR could be fine. A big issue will be whether Git Bash or some other emulator is enough to run the shell scripts on Windows without having to use a Linux virtual machine.
@shrishj @gentzkow letting you guys know that we got a Windows PC for two months. It is available at the lab at SIEPR.
Thanks JC! I will spend some time doing testing from my laptop and once exhausted for Mac, I will come to use the Windows PC.
Hi @gentzkow! Here are a few errors that a user may encounter when using template. I have added in ideas for how we can improve error handling. I have been having trouble with my conda environment when I was testing today, so I am unable to do try the wrong version of Python and even check what would happen if its missing entirely. I will visit the team this week to get some help. I will then look into using the Windows PC.
When running ‘bash make.sh’ before running ‘bash setup.sh’ we see the following error message:
When we try to run the program from the wrong directory we see the following error message:
When we remove one of the modules (3_slides) we see the following error message:
When we remove a ‘make.sh’ file from one module we see the following error message:
@shrishj Thanks!
The first of these is something I think we should definitely fix as you suggest. I think anywhere where we call local_env.sh
we should have this kind of error handling. The error messages should say that local_env.sh
is missing and then recommend that the user run setup.sh
to fix.
The others look to me like cases that will not happen as often, but still worth addressing if we can.
A general question for us to answer is how to add error handling without cluttering up the make.sh
scripts. Ideally, all of this should live within the scripts in lib/shell
. Maybe we create a new tool called check_setup.sh
that runs a bunch of checks including looking for local_env.sh
? We could call this at the top of each make.sh
. We can also add extra error handling within the run_XXX
commands.
Hi @gentzkow. I have made a commit to the development branch with a new check_setup.sh
file that houses the error handling. We then source check_setup.sh
in the run_all.sh
file to ensure that everything is in order prior to running the template.
General Updates:
check_setup.sh
file. If these modules are not installed, check_setup.sh
uses pip install to fix the issue. check_setup.sh
ensures that bash setup.sh
is run and that all appropriate modules are imported. The only issue is that template no longer works for Python 2.7 post my updates. I will investigate further to see if I can figure it out. However, from initial glance, it looks like the modules we are installing are not compatible with such old Python version. Then again, it seems unlikely for a user to be using such an old version. cc @jc-cisneros @ShiqiYang2022 @snairdesai
@shrishj Thanks.
We want to anticipate that most users will be running individual modules using the make.sh
scripts. The run_all.sh
script is actually run pretty rarely in practice, because for large projects it requires a lot of time. So we need to execute check_setup.sh
in each make.sh
script as well.
It's a bigger to do to think about how we want to handle package dependencies. For real projects, the set of dependencies will evolve a lot over time and we need a robust way to make sure that they're tracked correctly. We will want users to have the option of doing installation in a way that doesn't affect their local environment (e.g., via Conda). I think we should bracket all this for now and remove the module checking + pip install
commands from check_setup.sh
.
Hi @gentzkow. Thanks for the feedback! I have just committed to our development branch making the changes (sourcing check_setup.sh
in the make.sh
files and removing the pip install commands). I will speak to the team regarding how we can go about handling the package dependencies and comment a proposal soon. I will also continue testing to see if I notice anything else. Thanks!
Thanks @shrishj. I think we should defer package dependencies to a subsequent issue. Let's keep this one scoped to only what's in the issue description here.
Let's do the following:
check_setup.sh
should live in lib/shell/
check_setup.sh
should work no matter where we call it from, so we can't assume that local_env.sh
is at ../
. We should pass ${REPO_ROOT}
as an argument to check_setup.sh
and then look for the file at ${REPO_ROOT}/local_env.sh
. make.sh
files should work no matter where they are in the file structure provided ${REPO_ROOT}
is set correctly. So the call to check_setup.sh
should come after line 10 of make.sh
and should call ${REPO_ROOT}/lib/shell/check_setup.sh
not ../check_setup.sh
.Let's make the error message
The file
local_env.sh
was not found at the root of the repository. This is a local settings file that must exist in order formake.sh
to run. Please run the scriptsetup.sh
at the root of the repository to createlocal_env.sh
and set up the local environment.
One other note: I realize that we've left out the files .gitignore
and .gitattributes
from the template so far. These should live at the root of the repo. The .gitignore
file defines files that don't get committed to git, which should include local_env.sh
as well as .DS_store
. Please add the two files in the zip archive below to the root of the repo and delete the local_env.sh
and .DS_store
files that have been committed by mistake (check all subdirectories).
Next steps on this issue include:
run_xxx.sh
commands for the case where the program in question (Stata, R, Python, etc.) is not installed or the executable is not on the path.Let me know when you think you can complete this.
@snairdesai @jc-cisneros @ShiqiYang2022 Can you guys also do some testing here to see if you can uncover additional errors a user might run into and/or suggest more things for @shrishj to try?
Hi! Thanks for the feedback! I have made changes 1-4 locally.
When downloading Archive.zip
, it was showing me an empty folder (not sure if there is an issue on my side). I wanted to check if the Archive.zip was meant to contain the .gitignore
and .gitattributes
. For now, I have created my own .gitignore
and .gitattributes
. In .gitignore
, I only ignore local_env.sh
and .DS_store
. In gitattributes
I copied over what was in template. I have deleted the necessary files too. Before committing, I wanted to check that everything was in order. Thank you!
Huh. I think .gitignore
and .gitattributes
are hidden files, so you need to enable viewing them? (E.g., command-shift-P on a Mac)
Here is the content I had.
.gitattributes
*.log merge=binary
.gitignore
# `local_env.sh` is ignored as it contains user-specific settings
*local_env.sh
# `/temp/` subdirectories are ignored as they are temporary by design
*temp/
# `/output_local/` subdirectories are ignored as they contain large outputs that will be stored outside of Github
*output_local/
# The following extensions indicated intermediate files generated by certain programs during compiling
*.pyc
*.lyx~
*.lyx#
*.lyx.emergency
*.Rhistory
*.Rapp.history
*Rplots.pdf
*.DS_Store
*.aux
*.fls
*.lof
*.lot
*.nav
*.snm
*.toc
*.out
*.svn
*.ipynb_checkpoints/
*setup_stata.log
# Personal conda info
.conda_info
Thanks @gentzkow - it seemed to be hidden from my view. I have made the changes detailed in the previous message and will now look at adding error handling to the run_xxx.sh
commands when the program in question (Stata, R, Python, etc.) is not installed or the executable is not on the path.
Thanks!
@shrishj
I had some time to test the template on my new initialized branch and attached are my feedback:
I am highlighting package issue aside our slack conversation in case you need -- our template currently do not have a environment, per comment and comment, I think we need to create environment separately. An error throw out if we do not use old template
env in e00dd35. Per comment, we will defer this to an separate issue, @shrishj please feel free to open the follow-up when you have bandwidth.
check_setup.sh
cannot print my REPO_ROOT
, because this is not pre-defined. I addressed this in my branch. See here.
@gentzkow Do we want users to run make.sh
from any places it was called? If so, I suggest to define REPO_ROOT
in the absolute path format in make.sh
. @shrishj You can see what I mean here.
Some codes to output message, e.g. https://github.com/gentzkow/GentzkowLabTemplate/blob/86e43c070649807545528448a6940679a2271dcd/1_data/make.sh#L30 need to be adjusted to echo -e ...
, to enable echo
command to interpret certain escape sequences within the string. I fixed in my own branch here.
I think the input for /3_slides/
should live under /3_slides/input/
instead of 3_slides/source/input
; currently there are two places to store the /input/
. Also we do not need /3_slides/input/mpg.csv/
.
The problem same as 5 in /4_paper/
on repo structure.
In run_R.sh
https://github.com/gentzkow/GentzkowLabTemplate/blob/06b42fb8e3d4423a78613043afca120fcaee462f/lib/shell/run_R.sh#L17, we need to use >>
instead of >
, as >
means overwrite, while >>
means append. I fixed in 0108edc.
I addressed a subset of those bullets on my own branch issue3-shiqi-testing
, @shrishj if it looks good to you, feel free to merge my branch to the issue branch 3-try-to-break-the-template
in order to save your time. I will test the template on my Windows system when I have further bandwidth.
Thanks @ShiqiYang2022.
On (3), I'm assuming make.sh
will always be called from the directory it lives in. I.e., bash make.sh
not bash 2_analysis/make.sh
. I'm not sure how getting the absolute path as here would fix this -- won't it still break if I call it from a different directory because $REPO_ROOT_RELATIVE
will then be incorrect?
Other comments sound good to me.
@gentzkow Thanks!
I'm not sure how getting the absolute path as here would fix this -- won't it still break if I call it from a different directory because $REPO_ROOT_RELATIVE will then be incorrect?
You are right, sorry for the mistake. What I have previously in mind is we first got the absolute path where the script is, and then use that path to define the repo path. I fixed in 2bd74ad, the example for check_setup.sh
see here.
Nevertheless, if we anticipate user always call make.sh
from the directory it lives in, then we do not need to define the absolute path. But anyway thanks for the great catch!
Ooh, that's a nice solution! If we can make things robust to the user calling make.sh
from anywhere that will be a big improvement. In that case we might be able to get rid of the cd
commands in run_all.sh
?
Let's test this solution carefully to make sure it works robustly.
Another thing to think about would be how we can trap errors where the relative path in
REPO_ROOT=$(realpath "$SCRIPT_ABSOLUTE_PATH/../../../")
is wrong and so we're not pointing to the actual root of the repository. Maybe have a step in check_setup.sh
where we confirm that run_all.sh
exists in ${REPO_ROOT}
and issue a warning if it does not? If we did this we'd want to give the user instructions or turning off the warning and/or modifying the file it checks for, since there may be cases where someone intentionally deletes or renames run_all.sh
.
That's great! I agree with getting rid of cd
commands. Will make the team conscious on testing the robustness on this solution. I do not think we have steps to confirm run_all.sh
exist, we need to implement. I am also aligned with adding error shooting and checking messages to navigate users to fix.
Apologies for the delay. Thanks so much for your changes @ShiqiYang2022 and @gentzkow. I have merged Shiqi's development branch into our original branch. I have not deleted Shiqi's branch - I am happy to do it once Shiqi gives me the go-ahead.
I am adding a check for confirming if run_all.sh
exist in the ${REPO_ROOT}
. In terms of the error messaging, should I tell users how to toggle off this check by going into check_setup.sh
and changing some arbitrary boolean value?
Thanks @shrishj. Lets add the warning when run_all.sh
doesn't exist but not worry about giving the user the option to change it for now. Also, let's make sure this is a warning message not an error. We can just say:
The file
run_all.sh
does not exist in the directory currently specified as the repository root. This may mean that the variableREPO_ROOT
inmake.sh
is set incorrectly. The variableREPO_ROOT
should be a relative path pointing to the top level of the repository.
@shrishj
I have merged Shiqi's development branch into our original branch. I have not deleted Shiqi's branch - I am happy to do it once Shiqi gives me the go-ahead.
Awesome! You could go ahead and delete my test branch.
Hi @gentzkow. I have pushed a commit to the development branch with a warning message for run_all.sh
and we still allow the program to run. I also made a change to the $REPO_ROOT variable in check_setup.sh
as the variable was pointing to the parent directory instead of the required subdirectory. It is passing all of my tests so far. Thanks for the support!
@ShiqiYang2022 @shrishj I have reviewed these changes. I spent some time making additional updates -- revising the structure of the make.sh scripts and the run_XXX scripts to correct some errors and make them simpler / more robust.
I think we are good on the issues discussed so far.
Please do another quick round of testing to make sure my changes haven't introduced any errors. The remaining item is to test on Windows.
@shrishj I would like to accelerate the pace here. Do you think you'll be able to work the ~8 hrs per week we had agreed to once spring quarter starts?
@ShiqiYang2022 Do you think you might be able to jump in and do the Windows testing?
@gentzkow Thanks! Happy to speed up with full gear on this.
I am more than happy to do the windows test -- I am wrapping up what I have on my plate in SearchMarket project and conduct the testing today.
Great. Thanks! You should keep prioritizing SearchMarket ahead of this, but if you have a chance to do the Windows test that would be great.
Hi @gentzkow, very sorry for the delay. I have just finished my exams and will be working on both issues in the coming days. I will post some updates regarding testing and the work on this issue.
I am providing feedback on my Windows machine. Unfortunately, there are some errors, so this is a temporary post as I have not yet successfully run the template from my Windows machine. I will keep this post modified and updated.
Again, our current template does not have a corresponding environment. We need to create one; a (hot + easy) fix is to use what we have in our old template. But I vote to design one for this template. This need to be done prior to merging back to master
.
Line break issue. The error messages indicate that the all .sh
script, when cloning and checking out, the line ending has Windows-style (\r\n
), whereas Linux and UNIX systems expect UNIX-style line endings (\n
). This discrepancy leads to errors such as "$'\r': command not found". Unlike Shell scripts, R and Python interpreters (which are used in our old template) handle line endings can correctly identify line ends with either LF or CRLF, thus avoiding execution issues caused by line ending differences.
.gitattributes
as suggested, however, those did not work on my end. I think if necessary, I need to refresh the branch as suggested to normalize the line endings.setup.sh
, I implemented a fix and auto-check in fd61fc9.The issue of correct passing the entire file path. This is related to what we discussed in #85 comment and #85 comment. In the shell, we need to quote the whole path to ensure it is passed correctly. I implemented the fix in f101660.
The latex compile issue. I think this is because some configuration issues with my machine for latex. I am fixing this on my own end, so leaving a placeholder here. Will update this after I identified this issue clearly.
Thanks @ShiqiYang2022.
Can you remind me what this means?
Again, our current template does not have a corresponding environment. We need to create one; a (hot + easy) fix is to use what we have in our old template.
Can you also give a little more detail on what the line ending issue is? Is it only an issue for the shell scripts? Or does it mean all files committed to Git from Windows are going to have the wrong line endings for Mac and vice versa?
Let's not make any changes to the template yet unless they are things we'd want to do even if we weren't trying to have it run on Windows. I want to get a sense of the issues before deciding whether we want to make it cross-platform compatible.
Thanks @gentzkow! Re your https://github.com/gentzkow/GentzkowLabTemplate/issues/3#issuecomment-2010267938:
Can you remind me what this means?
Again, our current template does not have a corresponding environment. We need to create one; a (hot + easy) fix is to use what we have in our old template.
I think the way we run the template on Windows is, we need to use a conda environment, as we decided in https://github.com/gentzkow/template/issues/95#issue-2003434454 and https://github.com/gentzkow/GentzkowLabTemplate/issues/3#issuecomment-1948934677, to handle package dependencies.
For the line ending issue, sorry, I should be more articulated. It happened when I cloned the repository and shifted to the issue branch on my windows machine. Then I tried to run run_all.sh
from command line but it broke into error as follows:
This is due to the line endings from my machine has Windows-style (\r\n
), whereas Linux and UNIX systems (here is WSL(Windows Subsystem for Linux)) expect UNIX-style line endings (\n
).
Is it only an issue for the shell scripts?
Currently, I would say yes, because after I fix the line ending issue for all shell scripts, I can run /1_data/
and /2_analysis/
successfully.
Or does it mean all files committed to Git from Windows are going to have the wrong line endings for Mac and vice versa?
When committing or checking out, git will not change the line endings of the scripts. It will remain as what it is when commited unless specified core.autocrlf = true
.
Usually, the files we committed from Windows would not have such error when executed, because we call our scripts in python
or R
in most of our projects, and those interpreters can interpret the line endings no matter what type of line endings it is, and will not have issues in executing.
But, if we call our shell scripts using bash
interpreter, bash
are sensitive to the line endings. So when a shell script is called with CRLF (\r\n) line endings in Windows, Unix/Linux Shell interpreters (WSL) treat the CR (\r) as part of the command, leading to execution errors.
I hope it clarifies!
OK. Thanks.
On the Conda issue, I would have hoped we could be agnostic about how people are installing Python and R in this template, just as we are on Macs.
On the line ending issue, what are you using as the bash emulator on Windows? My thought had been we might start by trying Git Bash.
Understood, thanks!
I am using Windows Subsystem for Linux (WSL) on my end, I started using this as our lab used WSL for APD replication. I will also try using Git Bash!
Thanks @ShiqiYang2022. Interested to see whether it's any different in Git Bash.
I guess for an individual shell file another option would be to run it with something like
sed 's/\r$//' run_all.sh | bash
?
If that worked I guess Windows users could create a custom command line tool to do that (e.g., "wbash") and then we could set that to be the ${SHELL} executable.
Another option would be to have a git hook that converts all *.sh
files when they are checked out. But in that case would a Windows user no longer be able to read / edit them in a normal text editor?
@ShiqiYang2022
I was reading more about the Git line ending normalizations here. Would another option be to add
*.sh -text
to .gitattributes
so that the shell files retain their lf
line endings?
Also: We should probably open a separate issue for the Windows testing and wrap up the current issue.
Thanks @gentzkow!
sed 's/\r$//' run_all.sh | bash
Totally, this will work, yesterday in https://github.com/gentzkow/GentzkowLabTemplate/issues/3#issuecomment-2008626046 I implement a similar fix using dos2unix
, see https://github.com/gentzkow/GentzkowLabTemplate/commit/fd61fc98ea2c0abdd91939296ca2959d2e9e9226.
Using custom command line tool sounds like a applicable plan, will test on my end. Not sure whether this is necessary to use git hook, but I guess we could add similar conversion into our setup.sh
file? (see here)
I was reading more about the Git line ending normalizations here. Would another option be to add
*.sh -text
to
.gitattributes
so that the shell files retain theirlf
line endings?
Totally, I agree. Yesterday I implemented similar fixes in d121d0b to automatically fit the line endings across operating systems. Somehow it does not work on my end unfortunately. I am wondering whether this is because I implemented on the branch instead of main
. But I will investigate further, and also test the way to edit .gitattributes
as you suggested!
Opening a separate issue for the Windows testing sounds good to me! Will wrap up this with a PR soon.
Threads continues in its Pull Request #7.
Summary
In this issue we conducted a test run on Mac to identify potential problems in our current GentzkowLabTemplate
.
Merged into master in 1f45308. Final state of the issue branch here.
Goals:
An initial list of things to try; you should add to this as many other things as you can think of.
make.sh
script that calls an R/Python/Stata sciprtmake.sh
script withoutcd
-ing into the correct directory (e.g.,bash /mydir/template/1_data/make.sh
)setup.sh
first@snairdesai @jc-cisneros @ShiqiYang2022 I'm assigning this to @shrishj, but I'd like you all to help out as well. In particular, you should figure out what's the easiest way to get him access to a Windows machine. For Linux testing, we could try using Github codespaces? Or we could do it on a Mac creating clean environments w/ Conda?