StateTL - include more files through github

kelleythompson commented 2 years ago

Previously, I had been including only a limited number of StateTL files through github and had ignored a number of files that are required to run the program including the following: StateTL.exe (compiled version of code), StateTL_j349.exe (compiled version of fortran code), StateTL_inputdata.csv (csv copy of StateTL_inputdata.xlsx), handel.mat (sound file), and libwinpthread-1.dll and libgcc_s_seh-1.dll (dlls using for j349). The CoW project team decided we should have github store these files rather than getting them off of the project shared drive. A number of other base binary files have been provided on the shared drive, but these can be rebuilt using REST services etc and with the -r command line argument. There are two additional files that will not be put into github (1 for security reasons and 2 (StateTL_evapDiv2.mat) because its a big file) so these will still need to be retrieved from the CoW shared drive location.

kelleythompson commented 2 years ago

Included additional required files StateTL.exe (compiled version of code), StateTL_j349.exe (compiled version of fortran code), StateTL_inputdata.csv (csv copy of StateTL_inputdata.xlsx), handel.mat (sound file), and libwinpthread-1.dll and libgcc_s_seh-1.dll (dlls using for j349) into github. Two required files will still need to be downloaded from CoW shared drive.

smalers commented 2 years ago

It is typical that GitHub repositories only contain the source files necessary to build software and not the dynamic executables and libraries produced by the compile/build process. Storing dynamic files in the repository means that every developer that collaborates on a software project would create new files and Git would want to commit those to the repository, resulting in more commits and potential for one developer to undo the work of another based on the local files that the developer is using at the time. I have made exceptions to this if I REALLY need to save a historical archive of something or an executable is used in the build process and does not get updated that often.

A typical way to deploy software is to implement an automated build process that results in an installer of some type. Examples are:

zip file - can use 7zip to automate - I did this to distribute StateMod source code zip file to people that can't deal with GitHub
Windows installer - NSIS is used for TSTool and other CDSS software
makeself - can run in linux shells to create a "run" file for installation - TSTool linux version uses this
deploying production software via a GitHub clone/pull is typically not done because it requires users to install Git software, although they can just get from a file in the repository using the GitHub website.

The result can be posted on a website such as CoW shared drive and GitHub also has "releases" feature. This does result in an additional step in the build process. However, it avoids the issues with committing dynamic files to the repository and standardizes software processes. I typically put scripts in a build-util folder in the repository and these scripts extract the software version from the source code so that the deployed installer is consistent. This also forces the issue on maintaining a version history, such as with Semantic versioning and keeping release notes consistent and accessible so that people know what changed. When this is in place, there is much less reliance on email. Finally, for OpenCDSS I implemented a folder structure on the GCP were a top level product folder (e.g., statemod) has sub-folders for each version and within those are located the software downloaded, documentation, etc. Everything flows through the build process.

If executables are stored in the repository, one issue is that if the executable name is generic, it will overwrite as it is updated, and this may be confusing to users who want to find a certain version of the executable. Saving the executable in a way that indicates the version in folder or filename allows multiple versions to be saved. The amount of storage in Git will be the same because each version is saved (Git saves complete files, not differences, although it does compress the files).

My main point is that storing dynamic binaries in the repository is generally considered bad practice. If the CoW team has already thought these things through and is OK with the results of putting the files in GitHub, then my comments here are just perspective to be considered.

OpenCDSS / ArkDSS-Colors-of-Water

StateTL - include more files through github #16