TedStudley / mc-mini

Simple Stokes+Advection/Diffusion solver.
0 stars 4 forks source link

Separate the Output Folder from the Main Codebase #4

Closed TedStudley closed 8 years ago

TedStudley commented 8 years ago

It takes an obscenely long time to clone the repository, mostly because of the output values. I wouldn't be opposed to moving the output folder to a separate download in releases, but it really shouldn't stay in the codebase.

egpuckett commented 8 years ago

I agree. Isn't it possible to mark certain directories in GitHub so they are not tracked?

egpuckett commented 8 years ago
On 10/05/2015 11:15 PM, Ted Studley
  wrote:

  It takes an obscenely long time to clone the repository, mostly
    because of the output values. I wouldn't be opposed to moving
    the output folder to a separate download in releases, but it
    really shouldn't stay in the codebase.
  —
    Reply to this email directly or view it
      on GitHub.

I agree. Isn't it possible to mark certain directories in GitHub so
they are not tracked?

- Gerry
TedStudley commented 8 years ago

Unfortunately, not tracking the directories would mean that they're no longer in the repository, and it's not really trivial to just prevent a directory from being downloaded in the initial clone.

It would be possible to have those files tracked in a separate repository, but their format and use-case (huge binary files with infrequent changes) really means that they shouldn't be version-controlled the same way the rest of the project is. It would be best to have them available as a stand-alone download somewhere, either in the GitHub releases section or on a different site. If CIG has access to an FTP server which could be used, it would work to upload them there.

TedStudley commented 8 years ago

The output folder should be removed, but the size of the repository hasn't decreased as much as I had expected.

The repository was around 100MB before the changes, and I've gotten it down to around 36MB. Interestingly enough, it appears to be possible to get the size down even further.

Even though the output folder isn't present in any commit in the master branch (and all other branches have been merged in and deleted), grabbing a list of all file hashes still includes files from inside the output folder. This would seem to back up the idea that the files are still present (but unreachable) in the repo. We may need to wait for garbage collection to occur on the Server before we actually free up the space.

TedStudley commented 8 years ago

The output folder is gone from the entire git history, and all remaining unreachable blobs have been scrubbed from the object pack. Before, a fresh clone of the repo took ~250M of space. Now, a fresh clone of the repo takes only ~850K of space, and clones nearly instantly. I've checked to ensure that the codebase still compiles in order to make sure that this didn't break anything unexpectedly.