gentzkow / template_archive

20 stars 36 forks source link

Issue with "Use this template" and Git LFS integration #78

Closed snairdesai closed 1 year ago

snairdesai commented 1 year ago

The purpose of this issue (#78) is to address issues with the integrations of git-lfs raw files and the ability to use ~\gentzkow\template directly in template format. I've run into issues using this repository as a template for other independent projects, because the git-lfs files hosted on the repo are not properly tracked to any new repo initialized with this ~/template skeleton. Ngoc ran into the same issue when she was onboarding, as did BW when using ~/template to initialize another project for the team.

We've determined that forking ~/template to a new project allows for the transfer of git-lfs files, but ideally we would correct this integration element to enable the "Use as template" procedure. GitHub suggests this might not be easily solvable (see screenshot below), but it is worth some investigation to see if there are workarounds.

Screen Shot 2023-03-21 at 12 38 02 PM Screenshot from GitHub Docs

@jc-cisneros and I are assigned here.

snairdesai commented 1 year ago

Updates: I haven't yet been able to find a perfect workaround here, but I've confirmed once more that forking the repository to a new version (rather than Use as template) works in the way that we need. We can also duplicate template using the mirror approach from git. More on this here. I'll continue to look into this further in case another approach presents itself.

gentzkow commented 1 year ago

@snairdesai @jc-cisneros We had previously addressed this issue in #51. The intention in that task was to make the template totally free from git-lfs, and then have the initialization of git-lfs happen as part of the setup process. Can you take a look at the previous task and see why that solution is not sticking?

snairdesai commented 1 year ago

@gentzkow Thanks for flagging this issue! Yes, will look through the thread and keep you posted with updates here.

snairdesai commented 1 year ago

From first glance, seems like a potential issue might have arisen with this commit back to master. It seems all the lfs-filter lines were added back to .gitattributes. Need to look more closely, just flagging for myself.

jc-cisneros commented 1 year ago

@gentzkow @snairdesai let me share my current understanding of what is happening.

@snairdesai outlined these steps to test what would happen if we committed the changes in f610f7e to master:

  1. Fork template to another personal repo.
  2. Clone down the forked repo, and commit + push the change to .gitattributes which is given in f610f7e to its master branch (i.e., so that we don't directly commit to gentzkow/template/master for the moment).
  3. Use the forked repository as a template with the green button at top. Make a new repository with a name of your choice (i.e., templatetester).
  4. Pull down this newly created repository and try to access the git-lfs files

Using git lfs pull on the fourth step resulted in the following error:

Failed to fetch some objects from https://github.com/jc-cisneros/template_test.git/info/lfs

My sense is that if you create a new repository with the "Use as template" option GitHub moves the git-lfs pointer from the original owner of the template to the account that is creating the new repository (i.e., chips.csv and tv.csv are stored on @gentzkow's git-lfs account). GitHub documentation explicitly mentions that Git-LFS cannot be used with templates:

Screenshot 2023-03-31 at 3 10 09 PM

@jmshapir found the same issue on https://github.com/JMSLab/Template/issues/45. Given that it is still the case that there is no support for Git-LFS on template figures, we have the same options outlined by @jmshapir on that thread: 1) Move all lfs-tracked files in the Template to git storage and remove lfs from the Template repo. 2) Add all lfs-tracked files in Template to the .gitignore and remove these files from the Template repo. (But retain the .gitattributes and therefore lfs itself.) 3) Turn off the option of using Template as a template repository, so that in order to use it, users have to clone it and then copy it over to their new repository.

They decided to go for (2), but for this repo I am more inclined towards (1). Given that chips.csv and tv.csv are only 8.8 MB and 99 KB respectively, it does not hurt to keep them in the normal Git storage. We could tweak the practice exercises and include storing these files in Git-LFS as part of the process.

snairdesai commented 1 year ago

Edited

@jc-cisneros I agree with the above, and the errors you receive match my own from this process. From this comment by @gentzkow, my sense is that the goal of was to entirely rid the repository of git-lfs, and initialize this as part of our setup procedure (i.e., external to the clone or file transfer processes).

Based on this description, I would have expected the reversion made in f610f7e would address the concern with git-lfs transfers in child repos, but it does not seem that making this change fixes the issue for either of us. I've reached out to @szahedian for clarification here. If this prior issue was just meant to address problems with cloning or forking, that's of course distinct from the issue here arising from using this repository as a template for a new project.

Let me read through the thread in https://github.com/JMSLab/Template/issues/45 more closely and get back to you. I'm not sure that (1) actually makes sense here given how integrated lfs is within our repository structure currently, but I suppose we can decide depending on the feasibility of other solutions. If I had to pick at the moment, I would vote for (2) as decided above, because we might want to make this tractable to other skeletons which might host larger raw files. No strong preference, though.

Based on @szahedian's script, another approach would be to complete a mixture of (1) and (2). Meaning, we store the raw files with git, and then track them with git-lfs after the clone/use as template procedures using lfs_setup.sh. This makes the most sense to me given the purpose of #51, but viewing commit history at that time suggests the raw files were still stored with git-lfs, so it's unclear how this was working previously.

snairdesai commented 1 year ago

I'm also wondering if something happened to the raw files themselves between the commits made in #51 and those since then which is resulting in an issue. I'll generate a diff shortly.

UPDATE: Doesn't seem like there are any meaningful diffs in the raw files, see here.

gentzkow commented 1 year ago

Thanks guys. I am fine setting things up so there are no lfs-tracked files in the template repo. We'll just want to make sure that the setup process requires people to turn on lfs unless they explicitly opt not to.

snairdesai commented 1 year ago

Sounds good @gentzkow! Will implement shortly and update the README.

snairdesai commented 1 year ago

For myself: Complete list of lfs files currently living in template repo:

Screen Shot 2023-04-01 at 11 05 05 AM
snairdesai commented 1 year ago

In commit 0d7e60c, I renormalized all of our tracked lfs files to standard git. The final steps here for me are:

snairdesai commented 1 year ago

In 779d26e, I confirmed that running our current ~/setup procedure (running lfs_setup.sh) correctly tracks and populates our files with lfs. I'll revert back to 0d7e60c.

snairdesai commented 1 year ago

Summary + Deliverables


The purpose of issue (#78) is outlined in this comment. Users have had difficulties with the "use as template" option for generating child repos from template, due to an issue transferring ownership of git-lfs files across repositories. To address this, we decided to renormalize the raw files living in template to be stored with raw git rather than git-lfs, and reverted the .gitattributes file to a former version which was accidentally overwritten in a prior PR.

This issue followed in the associated PR (#79).