Closed rppawlo closed 7 years ago
@rppawlo, I am not sure that adding non-git commands directly into gitdist is a good solution. Right now, the appeal of gitdist is that it just distributes raw git commands across a set of git repos; simple. If we start to add commands for other VC systems, this is going to get complicated very fast.
It sounds like instead that Drekar developers are looking for a tool like mr ("myrepos").
See the debate going on in:
I might suggest some options more compatible with CMake/TriBITS/gitdist that Drekar developers might consider:
Personally, I am not excited about either of these two approaches but they would work with TriBITS and github with very little extra effort (e.g. once gitdist supports the dist-foreach command).
I have been waiting to see if Git LFS takes off and someone creates quality open-source server and local implementations. That would really make working with large binary files with git more seamless but I don't have high-hopes (see here and here).
For now, Drekar developers might just consider adding a simple script like drekar-repos.sh
in your base project repo with the command pull
that calls gitdist pull
and then does the svn update
on the few SVN repos and the command stauts
that does gitdist-status
and svn status
on the SVN repos or something.
Otherwise, let's talk.
FYI:
I cloned the myrepos git repo off github, installed mr
under ~/bin/
and then tried to set it up to use with my set of Trilinos git repos:
[8vt@th232 Trilinos (develop)]$ mr register
mr: cannot determine git url
I looked at the source code for mr
and the problem is that it is hard-coded to assume that the remote repo is named 'origin'. Because I use multiple git repos and different projects, I don't name my remotes 'origin' so that I don't get confused what repos I am point to in which project. For example, for doing direct Trilinos development, I have:
[8vt@th232 Trilinos (develop)]$ gitdist remote -v | grep -v "^$"
*** Base Git Repo: Trilinos
amklinv git@github.com:amklinv/Trilinos.git (fetch)
amklinv git@github.com:amklinv/Trilinos.git (push)
bddavid git@github.com:bddavid/Trilinos.git (fetch)
bddavid git@github.com:bddavid/Trilinos.git (push)
github git@github.com:trilinos/Trilinos.git (fetch)
github git@github.com:trilinos/Trilinos.git (push)
nightly software.sandia.gov:/space/git/nightly/Trilinos (fetch)
nightly software.sandia.gov:/space/git/nightly/Trilinos (push)
rab-gh git@github.com:bartlettroscoe/Trilinos.git (fetch)
rab-gh git@github.com:bartlettroscoe/Trilinos.git (push)
techx git@github.com:Tech-XCorp/Trilinos (fetch)
techx git@github.com:Tech-XCorp/Trilinos (push)
...
*** Git Repo: TriBITS
bb-rab git@bitbucket.com:bartlettra72/tribits (fetch)
bb-rab git@bitbucket.com:bartlettra72/tribits (push)
casl-dev git@casl-dev:TriBITS (fetch)
casl-dev git@casl-dev:TriBITS (push)
github git@github.com:tribitspub/TriBITS.git (fetch)
github git@github.com:tribitspub/TriBITS.git (push)
github-rab git@github.com:bartlettroscoe/TriBITS.git (fetch)
github-rab git@github.com:bartlettroscoe/TriBITS.git (push)
gsjaardema git@github.com:gsjaardema/TriBITS.git (fetch)
gsjaardema git@github.com:gsjaardema/TriBITS.git (push)
nschloe git@github.com:nschloe/TriBITS.git (fetch)
nschloe git@github.com:nschloe/TriBITS.git (push)
With all of these repos, it would be confusing what 'origin' pointed to without having to constantly run git remote -v
(which I used to do all of the time).
The issue is that what "origin" points to is determined by what project and what workflow I am using. So when I am working with Trilinos, I see:
[8vt@th232 Trilinos (develop)]$ gitdist-status
----------------------------------------------------------------------
| ID | Repo Dir | Branch | Tracking Branch | C | M | ? |
|----|-----------------------|---------|-----------------|---|---|---|
| 0 | Trilinos (Base) | develop | github/develop | | | |
...
| 7 | TriBITS | master | github/master | 1 | | |
| 8 | TriBITS/TriBITSDoc | master | bb/master | | | |
...
----------------------------------------------------------------------
(tip: to see a legend, pass in --dist-legend.)
So it is clear that I am pulling and pushing to the 'github' repos.
When I am working with CASL with Trilinos, I see:
[8vt@th232 VERA (master)]$ gitdist-status
------------------------------------------------------------------
| ID | Repo Dir | Branch | Tracking Branch | C | M | ? |
|----|--------------------|--------|-----------------|---|---|---|
| 0 | VERA (Base) | master | casl-dev/master | | | |
| 1 | TriBITS | master | casl-dev/master | | | |
| 2 | Trilinos | master | casl-dev/master | | | |
...
------------------------------------------------------------------
(tip: to see a legend, pass in --dist-legend.)
That makes it clear that I am pulling and pushing to the 'casl-dev' repos. See, no confusion!
mr
forces all of your git repos to point to 'origin'. What mr
should do is to just get the tracking branch and from the current branch. It should not hard-code that the name of the remote repo is 'origin'.
But mr
has no automated test suite at all so how can you safely modify and maintain it?
@rppawlo, it looks like git-annex has a new v6 repository mode that looks very close to the ease of usage of standard git commands (just git add <large-file>
and git commit
). It looks very close to the usage of the Git LFS spec (see annex-largefiles). The downside is that it will store two copies of each large file in your local git repo (one in the working tree and one in the .git/
directory). This is a very new feature as it seems this was just released in 2/11/2-16 (i.e. git-annex version 6.20160211). My guess is that the author of git-annex copied the ideas of Git LFS to make this easy to use. I am actually pretty excited about this if this actually works and is robust (because it would be essentially an open-source implementation of Git LFS).
Would have to try this out and experiment with it to see how it works. However, if this works out, it might give users and developers an interface much closer to a standard git repo without having to resort to using SVN or some other way of managing large files. I will ask Seth Johnson at ORNL if he has looked into git-annex v6 repository mode yet.
@rppawlo, we should wait and see how Seth responds to the below email but from looking into this new git-annex mode, I think it might be possible to streamline the usage of git-annex with new gitidst
commands dist-fetch
(which will do a 'git annex sync'), dist-pull
(which will do 'git annex sync --content') and dist-push
(which might also do 'git annex sync --content'). Once you set up the usage of gitdist and set up the logic for what (large/binary) files git-annex should manage in the committed .gitattributes
file, then using this could be as easy as:
$ cd <base-repo>/
$ gitdist-pull # alias to 'gitdist dist-pull' which also pulls new git-annex files
$ cd large_test_files/
$ git add <some-large-file>
$ git commit
$ cd ..
$ gitdist-status # alias to 'gitdist dist-repo-status' which could also consider git-annex status
$ gitdist-push # alias to 'gitdist dist-push' which also pushes new git-annex files
Git-annex meshes with git and gitdist much better than SVN. That is because you can still just run any regular git command in git repos that use git-annex. You can't do that with an SVN repo so it would be very ugly to try to shoehorn in support for SVN. But since git-annex is an extension to git, this should be much more seamless.
One issue is that they will need to install an updated version of git-annex on development.sandia.gov to be able to use this git-annex with git repos there. That is something that might require some SEMS support. But before that, we could experiment with a temp bare repo and see how it works. Also, users of would need to install git-annex on their local machines. But the SEMS NFS mount could help with that and make it more automatic.
If you are interested at looking into git-annex, perhaps we can look into this together a little? I know that none of us really has time for this type of thing, but if this works out, it could be a very nice solution to a pretty big problem with the usage of git for large binary files and multi-repos, all in one shot.
From: Bartlett, Roscoe A Sent: Wednesday, July 27, 2016 10:55 AM To: Johnson, Seth R. Cc: Pawlowski, Roger P; Evans, Thomas M. Subject: git-annex v6 repository mode?
Hello Seth,
I know that Exnihilo developers use git-annex for managing large binary files. Therefore, I thought that I might ping you on this …
I was looking at git-annex documentation today and I see that newer versions of git-annex (version 6.20160211 and newer) support a new “v6 repository mode” that looks to make git-annex much easier to use (much like Git LFS):
https://git-annex.branchable.com/tips/unlocked_files/ https://git-annex.branchable.com/tips/largefiles/ https://git-annex.branchable.com/not/
Basically, it allows you to set annex.largefiles
settings in our .gitattributes
file and then you can just use basic git add
, git commit
commands to automatically handle large files correctly with git-annex. It looks like it might be a better replacement for ‘auto-annex’ that is documented in the Exnihilo developers guide (attached).
But I am not sure if ‘git push’ and ‘git pull’ result in large files being synced automatically like happens with Git LFS or if you still have to use git-annex commands to do that (e.g. ‘git annex sync --content’). That would be the big question for usability. (But I think I could wrap this into gitdist commands dist-fetch, dist-pull and dist-push to automatically handle git annex repos in this fashion.)
Some discussion of this is in the comment:
I am looking into this because a project at SNL (one of Roger’s projects) is using SVN to handle large binary files. But using git-annex would have some advantages and it interfaces nicer with git. The issue is making git-annex as easy to use as raw git (or getting pretty close). If this works out, we might adopt this for CASL VERA repos as well (i.e. the VERAData repo).
Just curious if you had seen this git-annex “v6 repository mode” and have looked into this at all.
Thanks,
-Ross
Thanks for looking into this Ross! I'll sit tight until we hear back from Seth.
Adding @eric-c-cyr to this conversation
bummer about the "mr" tools being hard coded to origin - it looked really good.
bummer about the "mr" tools being hard coded to origin - it looked really good.
It might just be that the 'mr register' command is hard-coded to 'origin'. Grepping through the script, I don't see other explicit mentions of 'origin'. It might be that if you manually build the the .mrconfig
file that you can get around that restriction.
I understand the attractiveness of SVN (because many people already know that tool) but git-annex might be worth a look. Let's see what Seth says.
The response back from Seth my response to him are given below.
Boy, if GitLab could be installed at SNL and supported Git-LFS, then that is something that SEMS should really look into. That could make GitLab very attractive. Note that Sandia does seem to have a GitLab instance installed (see issue https://sems-jira.sandia.gov/browse/SEMS-1104).
But note that Git-LFS is not all roses:
But of course git-annex and especially SVN are not perfect either. But at least git-annex fits in with a git-based workflow and supports distributed repos. SVN does not at all. The biggest problem with SVN is that it clashes with the git workflow. You can't commit things locally and then push to all the repos all at once with SVN. With SVN, you need to create the commit message at the same time that you push the commit. When multiple repos are involved, this is just a mess. Based on this, I would look at git-annex.
I just wish we had someone at SNL who could dig into all of this for us. Should we ask SEMS to look into this? Dealing with large binary files in a git-based workflow is not a trivial problem. When you starting throwing SVN repos into the mix, I don't know how you build a smooth development and multi-repo integration model out of that.
From: Bartlett, Roscoe A Sent: Thursday, July 28, 2016 1:43 PM To: 'Johnson, Seth R.' Cc: Pawlowski, Roger P; Evans, Thomas M. Subject: RE: [EXTERNAL] Re: git-annex v6 repository mode?
Seth,
Except for push and pull, git-annex with this new repository mode looks almost as easy to use as Git-LFS (but we would have to do a detailed comparison).
Good to hear about Git-LFS support with GitLab. I just found:
Is that supported on the ORNL installation of GitLab (code-int.ornl.gov) currently or it is planned for the future?
It also seems that GitLab supports git-annex as well:
(but not sure about the newest “v6 repository mode”).
So GitLab seems like a good opportunity to compare git-annex and git-lfs side-by-side.
In any case, please let us know how your experimentation with Git-LFS and GitLab goes and how it compares to git-annex.
But at this point, you would still recommend using git-annex over using a separate SVN repo?
Thanks,
-Ross
From: Johnson, Seth R. [mailto:johnsonsr@ornl.gov] Sent: Thursday, July 28, 2016 12:55 PM To: Bartlett, Roscoe A Cc: Pawlowski, Roger P; Evans, Thomas M. Subject: [EXTERNAL] Re: git-annex v6 repository mode?
Hey Ross,
Thanks for the ping! I'd not heard of that new git-annex mode but it sounds pretty easy to use. I doubt that git-annex will ever work as cleanly as just git, but it sure works easier than a separate SVN repository. We're actually looking into using Git-LFS because that's what may be supported by our local gitlab server. Please let me know what you find out.
Thanks, Seth
Below is and email response from Seth and my response to him. To sum up feedback from Seth:
git annex add <file>
, etc.), Seth would still recommend using git-annex over an SVN repo.Since my last comment, I have done a lot more research on git-annex and Git-LFS. I looked at several presentations on youtube including:
At this point I think I know enough about git-annex and Git-LFS to draw some conclusions.
First, Git-LFS is the future for the handling of large binary files with git. There is no question about that. And using Git-LFS would require zero changes to gitdist or checkin-test.py. However, unless you have a local recent installation of GitLab Enterprise Edition (EE) or GitHub Enterprise, or BitBucket installed locally and supported on your local system, you can't use it for sensitive projects. And even for open-source projects, the support for large files with Git-LFS with GitHub, GitLab, and BitBucket is going to be very limited (because large files and lots of I/O costs real money). Also, there open-source Git-LFS server implementation. Therefore, I don't think that Git-LFS is a viable solution for projects that have sensitive data (that therefore needs to be protected behind a firewall) or needs to handle really large files. It may take a while before wwe can use Git-LFS at the labs due to need for serious infrastructure (both programmatic support for the supporting tools and the hardware to support it). In a few years, I would think that Git-LFS will overtake everything, including git-annex. But at the preset time, I don't see that Git-LFS is a viable option.
If the desire is to keep using gitdist for handling all the repos, then I think that git-annex with using "v6 repository mode" is worth looking into. It may be nearly as easy to use as raw git and it will integrate well with the gitdist tool (if that is indeed the desire). If there is interest in this, then the next step would be to install a recent version of git-annex v6 and prototype its usage at handling large binary files and setting up an rsync git-annex specialremote repo.
However, if the desire is to just keep using SVN for managing large binary files, then I would recommend creating a specialized script called something like drekar-repo
and then give it commands pull
, status
and push
which would just call gitdist pull
, gitdist-status
, and gitdist push
, respectively. I will go into the proposal more in the next comment.
In summary, a reasonable plan would be:
1) Short term: Develop a simple project-specific wrapper script drekar-repo
that calls gitdist and SVN commands and tell Drekar developers to always use drekar-repo pull
, drekar-repo status
and drekar-repo push
and not raw pushes or raw gitdist.
2) Intermediate term: Investigate the usage of git-annex with "v6 repository mode" enabled. See if it is a viable tool (i.e. it works and is relatively easy to use for the targeted workflows). If it is, then set up Drekar to use git-annex and add support for git-annex to gitdist (and checkin-test.py so that CASL VERA can use git-annex).
3) Long term: Watch and see where Git-LFS goes. It is likely that Git-LFS will become ubiquitous and make every other solution for handling large binary files with git obsolete. But it could take years before that happens.
I will update the title of this Issue to reflect the larger scope.
From: Bartlett, Roscoe A Sent: Friday, July 29, 2016 2:14 PM To: 'Johnson, Seth R.' Cc: Pawlowski, Roger P; Evans, Thomas M. Subject: RE: [EXTERNAL] git-annex v6 repository mode?
Seth,
Responses inline ...
From: Johnson, Seth R. [mailto:johnsonsr@ornl.gov] Sent: Thursday, July 28, 2016 7:58 PM To: Bartlett, Roscoe A Cc: Pawlowski, Roger P; Evans, Thomas M. Subject: Re: [EXTERNAL] git-annex v6 repository mode?
The trick may be getting users set up correctly, and able to know when to annex and not to annex files. :)
[Ross] That is the beauty of the git-annex “v6 repository mode”. The git-annex author used the same git clean and smudge filters as Git-LFS to let you specify what types of files should get annexed and which should be handled by regular git and then users can just use git add <file>
. For example, in your .gitattributes file, you can specify:
* annex.largefiles=(largerthan=100kb)
*.c annex.largefiles=nothing
*.h annex.largefiles=nothing
[Ross] That will result in all files larger than 100kb, except .h and .cpp files, being annexed automatically when added with a raw git add <file>
. See:
[Ross] It looks like you can include/exclude entire directories, files with a given extension, etc. It looks very flexible.
[Ross] What git-annex is lacking is the pre-push hook that Git-LFS uses to automatically send the annexed files to the server using a raw git push
command. Git-LFS handles pulling down LFS-managed files by copying them in the smudge filter when you checkout a branch. The details of how this works are given in this nice presentation:
[Ross] I am not sure if git-annex does this or not. We would have to see. But having to run ‘git annex sync --contents [--no-push|--no-pull]’ does not seems so bad (and I can add that to a new gitdist dist-pull/dist-pull commands).
[Ross] If you are interested, you might just take a quick look at these two web pages above (they are not very long).
I guess if you're using a checkin-test script, you could have it query uncommitted files for their kind and size, and automatically annex them based on some heuristic. I have something like this in a standalone script at Exnihilo/environment/python/exnihiloenv/autoannex.py
[Ross] I think that the git-annex “v6 repository mode” makes the usage of something like autoannex.py unnecessary (see the above .gitattributes file).
[Ross] I think the only thing the checkin-test.py script would need is to handle the pull and push operations for annexed repos. And for that, I think I would add support for git annex push/pull to the gitdist script and make the checkin-test.py script use gitdist for pull and push operations. Also, we would need a way to make sure that modifications to annexed files where committed before doing the final test and push. We would need to experiment with git-annex to see how to get that info (but I would guess that ‘git annex status’ would do that).
The ORNL GitLab says they're looking into it but would have to figure out a model for charging for scalable disk access. So I can't provide any feedback on it at this time.
[Ross] What that means is that Git-LFS is not even an option right now at ORNL, right? Opening the door to huge file storage is a bit scary for them I suspect. It looks like code-int.ornl.gov GitLab site at ORNL does not even support git-annex (which was added before support for Git-LFS). A big advantage of git-annex is that you can actually use a different machine to store your annexed files so you can use it even with GitHub or BitBucket. See:
[Ross] This makes your project its own master for storing large binary files. You can just set up an rsync special remote server that everyone in your team can access (using a unix group protection on the server) and then make them point to that. See:
Even so, I'd still vastly prefer git-annex over a separate SVN repository. It means one fewer thing to pull from and keep synchronized; plus the git method of retaining history is clearer.
[Ross] That is what I am thinking. From the documentation that I have seen for the newer git-annex “v6 repository mode”, it seems fairly usable (almost as good as Git-LFS, if it actually works). The big advantage of git-annex that I can see is that it has a fully free and open-source server implementation that you can use right now. There are no open-source server implementations for Git-LFS that I can find. All of the major commercial players have their own server implementations (e.g. GitHub, BitBucket, GitLab, MS VSO) but there is nothing that you can use with your own git repos. The only Git-LFS option for behind the firewall is proprietary commercial GitLab EE (and GitHub Enterprise but that is very expensive). The fact that Git-LFS only works with proprietary implementations bothers me a lot. Given all of this, I think I would sit tight and wait and see where Git-LFS goes. My guess is that in 2 more years there will be a quality open-source server implementation (perhaps GitLab?) and the git community will have worked out all of the bugs. That will be a wonderful time! But until then, we have to get our work done.
Cheers,
-Ross
FYI: Looks like Sandia has GitLab sites (SON and SRN) set up that are supposed to support Git-LFS (but not git-annex). I have created Trilinos JIRA Issue TRIL-62 to document this and investigate if Git-LFS works.
So after a lot of back and forth with the maintainer of the gitlab.sandia.gov site and a lot of research and experimentation, it seems that Git-LFS works with that site but you have to use HTTPS authentication (SSH authentication does not work with GitLab with Git-LFS). See Trilinos JIRA Issue TRIL-62.
But I think I am far enough along to say that Git-LFS with the SRN site gitlab.sandia.gov may be a viable solution right now to replace SVN to manage large binary files. I think the next step would be to discuss this some and then, if it makes sense, to actually try creating the Drekar git repo copy of the SVN repo using Git-LFS on gitlab.sandia.gov and see how it performs with cloning, changing files, pushing etc. Git-LFS is very easy to use once you have the Git-LFS client installed (which is also very easy to do and we can provide that on all platforms pretty easily I think).
Given what I have learned about Git-LFS, I think that this might be a good option for some of the larger test files with Drekar. It might even be a good option for trimming down some of the larger test files in Trilinos. It seems that if you clone a git repo that manages some files with Git-LFS and the local user is not set up to use Git-LFS, then nothing all that terrible happens (see this comment).
@rppawlo,
I created a Git-LFS version of the DrekarSystemTests repo on the SNL GitLab servers gitlab-ex.sandia.gov and gitlab.sandia.gov and it went smoothly. It was an easy process and I got it working with Git-LFS on a new machine (shiler) in just a few minutes. The details are below.
If we can get the Git-LFS client installed on to the SEMS NFS mount and the ATTB machines, then it should be trivial for Drekar developers to use a Git-LFS version of the DrekarSystemTests repo. Drekar developers need to run two commands one time on each new machine:
$ git lfs install
$ git config --global credential.helper cache
and they are set and never have to think about Git-LFS again. For machines that don't have the Git-LFS client centrally installed, installing it locally is just a few simple commands (see below) and we could provide a single script that would do this in one shot (i.e. download the Git-LFS client and install it).
The only issue I can see with this is that you have to use HTTPS to authenticate with the current GitLab servers. Therefore, you have to cache your username and password. This might be an issue for automated tests run by Jenkins. But you can use a file to store these and then you never need to type them again (see this TRIL-62 comment) which is allowed for SNL entity accounts.
Detailed Notes:
For the heck of it, I converted the DrekarSystemTests SVN repo to a Git-LFS repo.
First, I copied the Git-LFS client to shiller:
$ scp git-lfs-linux-386-1.3.0.tar.gz shiller:~/.
and then installed it with:
$ ssh shiller
$ tar -xzvf git-lfs-linux-386-1.3.0.tar.gz
$ cd git-lfs-1.3.0/
$ env PREFIX=$HOME ./install.sh
$ git lfs install
(This installed git-lfs
into $HOME/bin
and I have $HOME/bin
set in my PATH
env var so that git can find it.)
I set up for git to cache my username and password for HTTPS:
$ git config --global credential.helper cache
I then created the empty private GitLab projects under my account:
I then cloned the empty repo and set up for Git-LFS with:
$ git clone https://gitlab.sandia.gov/28084/DrekarSystemTests.git
$ cd DrekarSystemTests/
$ git lfs track "*.exo"
$ git lfs track "*.gen"
$ git lfs track "*.png"
(The *.xml and other files should be stored by raw git since they are human-written text files.)
I then copied the contents of the DrekarSystemTests.svn repo trunk/ directory into the new git repo, removed all of the .svn/ dirs and then did:
$ git add .
$ git commit
$ git remote rename origin gitlab # I don't like the name origin!
$ git remote add gitlab-ex https://gitlab-ex.sandia.gov/rabartl/DrekarSystemTests.git
$ git push gitlab master
$ git push -u gitlab-ex master
I then moved the existing DrekarSystemTests local repo out of the way and did a fresh clone with:
$ time git clone https://gitlab.sandia.gov/28084/DrekarSystemTests.git
Cloning into 'DrekarSystemTests'...
remote: Counting objects: 284, done.
remote: Compressing objects: 100% (154/154), done.
remote: Total 284 (delta 123), reused 284 (delta 123)
Receiving objects: 100% (284/284), 171.04 KiB | 0 bytes/s, done.
Resolving deltas: 100% (123/123), done.
Checking connectivity... done.
Downloading ATDM/Verification/linear_plasma_waves/dispersion_plots_data/cold_efluid_Bnorm.png (39.46 KB)
...
Downloading vector-restart/vector-restart.gold.exo (285.77 KB)
Checking out files: 100% (245/245), done.
real 0m16.698s
user 0m5.924s
sys 0m2.455s
Here is how the size of the clones match up:
$ du -sh DrekarSystemTests DrekarSystemTests.svn
77M DrekarSystemTests
54M DrekarSystemTests.svn
The Git-LFS repo is larger than the SVN repo because it has to store two copies of every binary file (see this TRIL-62 comment). But you can see that Git-LFS is managing most of the data from looking at:
$ du -sh DrekarSystemTests/.git/* | sort -rh
25M .git/lfs
184K .git/objects
48K .git/hooks
32K .git/index
16K .git/logs
12K .git/refs
4.0K .git/packed-refs
4.0K .git/ORIG_HEAD
4.0K .git/info
4.0K .git/HEAD
4.0K .git/FETCH_HEAD
4.0K .git/description
4.0K .git/config
0 .git/branches
See, the .git/lfs
directory is storing 25M worth of large files and git itself is only storing 184K of (compressed) regular files.
Now you can use gitdist including DrekarSystemTests with no modifications (just add DrekarSystemTests to the .gitdist file):
$ cd Trilinos/
$ cp .gitdist.default .gitdist
$ echo DrekarSystemTests >> .gitdist
$ gitdist-status
----------------------------------------------------------------------
| ID | Repo Dir | Branch | Tracking Branch | C | M | ? |
|----|----------------------|---------|------------------|---|---|---|
| 0 | Trilinos (Base) | develop | github/develop | | | |
| 1 | packages/moocho | master | github/master | | | |
| 2 | packages/Sundance | master | origin/master | | | |
| 3 | packages/CTrilinos | master | origin/master | | | |
| 4 | packages/ForTrilinos | master | origin/master | | | |
| 5 | packages/mesquite | master | origin/master | | | |
| 6 | TriBITS | master | github/master | | | |
| 7 | TriBITS/TriBITSDoc | master | bb/master | | | |
| 8 | preCopyrightTrilinos | master | github/master | | | |
| 9 | DrekarBase | master | ssg/master | | | |
| 10 | DrekarResearch | master | ssg/master | | | |
| 11 | DrekarSystemTests | master | gitlab-ex/master | | | |
----------------------------------------------------------------------
(tip: to see a legend, pass in --dist-legend.)
Now I can pull and push to all of the Drekar repos at once cleanly!
We could streamline the cloning of these repos using the TriBITS clone-extra-repos.py script for the Drekar repos. I can show how to do that if interested.
On internal machines that accessing HTTP requires a proxy. Thus on some of the clusters, I've used ssh to get to github. Is there a way to set the HTTP proxy in git?
On internal machines that accessing HTTPS requires a proxy. Thus on some of the clusters, I've used ssh to get to github.
Eric,
I did not need to set any proxy from the machines shiller or muir in order to access github.sandia.gov or github-ex.sandia.gov. What machines need a proxy? Are these on the SRN, SON or outside of the SNL network?
In any case, hopefully there should be simple instructions to set up the HTTPS proxy on any given machine.
Longer term, the hope is that GitLab will get Git-LFS to use SSH authentication. See:
But note that BitBucket seems to have this figured out:
Is there a way to set the HTTPS proxy?
Don't know. Can you point me to a specific machine that has a problem and perhaps I can give it a try?
Just a tip, but you can speed up the initial clone using git lfs clone <url>
vs. git clone <url>
.
The time with raw git clone <url>
is:
time git clone https://gitlab-ex.sandia.gov/rabartl/DrekarSystemTests DrekarSystemTests.again
Cloning into 'DrekarSystemTests.again'...
remote: Counting objects: 426, done.
remote: Compressing objects: 100% (223/223), done.
remote: Total 426 (delta 206), reused 414 (delta 194)
Receiving objects: 100% (426/426), 223.14 KiB | 0 bytes/s, done.
Resolving deltas: 100% (206/206), done.
Checking connectivity... done.
Downloading ATDM/Verification/linear_plasma_waves/dispersion_plots_data/cold_efluid_Bnorm.png (39.46 KB)
...
Downloading vector-restart/vector-restart.gold.exo (285.77 KB)
Checking out files: 100% (362/362), done.
real 0m25.300s
user 0m5.339s
sys 0m1.291s
The time with git lfs clone <url>
:
time git lfs clone https://gitlab-ex.sandia.gov/rabartl/DrekarSystemTests DrekarSystemTests.again
Cloning into 'DrekarSystemTests.again'...
remote: Counting objects: 426, done.
remote: Compressing objects: 100% (223/223), done.
remote: Total 426 (delta 206), reused 414 (delta 194)
Receiving objects: 100% (426/426), 223.14 KiB | 0 bytes/s, done.
Resolving deltas: 100% (206/206), done.
Checking connectivity... done.
Git LFS: (45 of 45 files) 24.27 MB / 24.60 MB
real 0m5.141s
user 0m4.379s
sys 0m0.918s
The git lfs clone <url>
command gets all of the LFS objects at once. The raw git clone <url>
gets the LFS files one at a time.
The same goes for git lfs pull
vs. git pull
. It is just a performance thing.
I think that TriBITS could be made to learn about Git-LFS repos (just list them as such in the ExtraRepositoriesList.cmake file, such as with 'GIT-LFS' instead of just 'GIT') and a special gitdist command dist-pull
and dist-push
could run git lfs pull
and git lfs push
, respectively. If a repo is not using Git-LFS then it just does the basic pull and push, respectively. That could be a value added for gitdist and TriBITS to automatically figure out that the Git-LFS client is installed and then use it.
BTW, I went ahead and updated the DrekarSystemTests Git-LFS repo for the current version of the SVN repo just to show that it is easy to do in the commit:
commit 76799afc8ef2156e1892e0150309e63daaa48a43
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date: Wed Aug 10 19:22:57 2016 -0600
Update snapshot from SVN r128
r128 | seamill | 2016-08-10 19:00:16 -0600 (Wed, 10 Aug 2016) | 1 line
Consolidating some tests.
4.4% ATDM/Verification/linear_plasma_waves/WaveWarmEFluid/TEM/
10.8% ATDM/Verification/linear_plasma_waves/util_scripts/warm_electron_modules/scripts/
15.3% ATDM/Verification/linear_plasma_waves/util_scripts/warm_electron_modules/
16.8% ATDM/Verification/linear_plasma_waves_convergence/WaveColdEFluid/Bnorm/
22.2% ATDM/Verification/linear_plasma_waves_convergence/WaveColdEFluid/Bpara/
10.9% ATDM/Verification/linear_plasma_waves_convergence/WaveColdEFluid/Bzero/
5.8% ATDM/Verification/linear_plasma_waves_convergence/convergence_plots_data/
4.9% ATDM/Verification/linear_plasma_waves_convergence/util_scripts_convergence/
3.0% ATDM/Verification/
4.9% ATDM/braginskii_zeroB/
It is just a few commands to do this:
$ cd Trilinos/
$ cd DrekarSystemTests.svn/ # my rename of the SVN repo
$ cp -r . ../../DrekarSystemTests/
$ find . -type d -name .svn -exec rm -rf {} \;
$ git add .
$ git commit # just copied in info from the top SVN commit
Ross - just occurred to me that there might be another solution. git supports working directly from svn repos.
https://git-scm.com/book/en/v1/Git-and-Other-Systems-Git-and-Subversion
I recall that the moose team did this for quite a long time before moving to github. It's not the most appealing, since there are some a few dangerous things you have to manage. Just thought I would mention. I discussed the git-lfs path with the Drekar team and they were hesitant about adoption. They would like to make sure this is the best long term solution. I think that one issue is that the COE doesn't have a package for git-lfs and as far as I can tell, ubuntu doesn't either. So every computer we run on requires this install. sems support would go a long way.
Ross - just occurred to me that there might be another solution. git supports working directly from svn repos.
https://git-scm.com/book/en/v1/Git-and-Other-Systems-Git-and-Subversion
I recall that the moose team did this for quite a long time before moving to github. It's not the most appealing, since there are some a few dangerous things you have to manage. Just thought I would mention.
I had not thought about git-svn. The truth is that the DrekarSystemTests repo is really pretty small so I suspect that even if git-svn is extremely slow (which people seem to be claiming) that using git-svn with DrekarSystemTests may not be too bad.
But looking at:
you have to use separate commands for pulling and pushing to the SVN repo using git-svn and this would require training for Drekar developers and gitdist would need to be extended to work with git-svn repos.
To pull with git-svn, you have to use git svn rebase
instead of git pull
like with git-lfs and raw git. That means that the gitdist script would need a new command called something like dist-pull
that would run git svn rebase
instead of git pull
on a repo that was detected to be a git-svn repo.
To push with git-svn, you have to type git svn dcommit
instead of just git push
like with git-lfs and raw git. So adoption git-svn would mean that we we would have to add a special command to gitdist called dist-push
that would have to be trained to run git svn dcommit
when called on an git-svn repo. However, using git-lfs to push does not require any changes to gitdist at all.
That is a big advantage of git-lfs over git-svn.
I discussed the git-lfs path with the Drekar team and they were hesitant about adoption.
I definitely understand the apprehension. Having to use HTTPS authentication instead of SSH apprehension is the biggest apprehension I personally have. But compared to git-svn, just from what I have seen so far, I would personally use git-lfs over git-svn if I had a choice, and not just because git-lfs does not require any changes to gitdist at all. Git-LFS just fits in better with a git-based workflow than git-svn.
They would like to make sure this is the best long term solution.
From everything that I have read and I have experienced, all things considered, Git-LFS looks like the best long-term solution for managing large binary files with a git-based workflow. It is the most transparent for developers to use and it has what appears to be very broad industry support (e.g. GitHub, BitBucket, and most importantly GitLab) and there will be better and better implementations and more implementations of Git-LFS capable servers as time goes on. I am about 100% sure about that given what I have seen.
I think that one issue is that the COE doesn't have a package for git-lfs and as far as I can tell, ubuntu doesn't either. So every computer we run on requires this install. sems support would go a long way.
Yes that is an issue, but I think we can making it a simple two-step process to install the Git-LFS client on any SON or SRN machine in each users home directory using:
I would put together the install-git-lfs.sh script and then ask Drekar developers just to try it and see if it works for them. That should be a low investment on their part.
As for support the systems, things don't look too good for git-svn. I just tried to use git-svn to clone the DrekarSystemTests repo on the SR machine muir using the SEMS installed git:
$ which git
/projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/bin/git
and it seems that the SEMS team did not install git 2.1.3 to support git-svn:
$ git svn clone svn+ssh://software.sandia.gov/svn/DrekarSystemTests
Can't locate SVN/Core.pm in @INC (@INC contains: /projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/share/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/share/perl5/Git/SVN/Utils.pm line 6.
BEGIN failed--compilation aborted at /projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/share/perl5/Git/SVN/Utils.pm line 6.
Compilation failed in require at /projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/share/perl5/Git/SVN.pm line 33.
BEGIN failed--compilation aborted at /projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/share/perl5/Git/SVN.pm line 33.
Compilation failed in require at /projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/libexec/git-core/git-svn line 25.
BEGIN failed--compilation aborted at /projects/sems/install/rhel6-x86_64/sems/utility/git/2.1.3/libexec/git-core/git-svn line 25.
Even the default COE version of git:
$ /usr/bin/git --version
git version 1.7.1
does not seem to support git-svn:
$ /usr/bin/git svn clone svn+ssh://software.sandia.gov/svn/DrekarSystemTests
git: 'svn' is not a git command. See 'git --help'.
Did you mean one of these?
fsck
show
Seeing this, it might be easier for the SEMS team and the ATTB machines to install the binary git-lfs than to fix git install that has a broken git-svn.
However, things look better for git-svn on the ATTB machines. I tried git-svn on the machine shiller and it worked:
$ time git svn clone svn+ssh://software.sandia.gov/svn/DrekarSystemTests
...
real 0m47.089s
user 0m7.434s
sys 0m12.692s
So that is a little longer than the 16s needed to clone the git-lfs repo with:'
$ time git clone https://gitlab.sandia.gov/28084/DrekarSystemTests.git
...
real 0m16.698s
user 0m5.924s
sys 0m2.455s
but not too bad.
What is interesting is that the git-svn repo is not on a tracking branch but it does show modified and untracked files:
[rabartl@shiller01 DrekarSystemTests (master)]$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: trunk/config/platform_plugin.py
Untracked files:
(use "git add <file>..." to include in what will be committed)
junk.out
no changes added to commit (use "git add" and/or "git commit -a")
That problem is that I made a local commit not pushed to the SVN repo:
fefa5a5 "Added a comment line (don't push)"
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date: Wed Aug 17 09:32:54 2016 -0600 (5 minutes ago)
M trunk/config/idplatform.py
and I am not sure how to see that commit has not been pushed yet.
With gitdist-status
, therefore, this will not show any updated commits:
[rabartl@shiller01 Trilinos (develop)]$ gitdist-status
------------------------------------------------------------------
| ID | Repo Dir | Branch | Tracking Branch | C | M | ? |
|----|-------------------|---------|-----------------|---|---|---|
| 0 | Trilinos (Base) | develop | github/develop | | | 1 |
| 1 | DrekarBase | master | ssg/master | | | |
| 2 | DrekarResearch | master | ssg/master | | | |
| 3 | DrekarSystemTests | master | | | 1 | 1 |
------------------------------------------------------------------
That is problem with the workflow using git-svn. We would need to train the gitdist dist-repo-status
command to figure out
My initial opinion is that git-svn is more confusing that git-lfs.
In summary, the pros and cons for git-lfs vs. git-svn that I can see so far are:
git-lfs:
git lfs install
on time for git-lfs to work properly on that machine.git-svn:
dist-pull
and git-push
and would have to train dist-repo-status
to detect new commits.Rather than all this stuff, Drekar might just consider a simple drekar-rep
script with commands pull
, status
, and push
. That would be easy to write and would likely be enough for Drekar developers until you adopt something better (like Git-LFS when SSH authentication is supported with GitLab, which GitLab is currently working, and 9300 will upgrade GitLab within a month once GitLab CE supports SSH authentication).
We should discuss all of this stuff.
Looks like GitLab CE is on the verge of allowing Git-LFS to use pure SSH authentication with the GitLab server (but transferred over HTTPS). See this update. We don't care how the objects are copied (SSH or HTTPS are fine, both are encrypted), we only care about how the authentication is done.
Hopefully we will see this on the SNL GitLab server in a few months. We need to wait for this to get merged to the GitLab 'master' branch and put into an offical release (which go out once a month) and then wait for the people in SNL org 9300 to update the SNL GitLab servers.
FYI:
It looks like the next release of GitLab CE will have support for pure SSH authentication for Git-LFS. See:
I think this means that this will show up on the SNL GitLab servers in a few months.
Good news, the latest release of GitLab has the full SSH authentication for Git-LFS:
Now we just need to get them to insall it for the SNL GitLab servers.
Looks like they already installed GitLab 8.12 on the SNL GitLab servers but it does not seem to work for SSH Git-LFS authentication yet.
From: Bartlett, Roscoe A Sent: Tuesday, October 18, 2016 10:28 PM To: Hickey, Richard A Subject: RE: GitLab 8.12 with pure SSH for Git-LFS
It looks like it does not work:
$ git lfs clone git@gitlab.sandia.gov:28084/DrekarSystemTests.git
Cloning into 'DrekarSystemTests'...
…
remote: Counting objects: 426, done.
remote: Compressing objects: 100% (223/223), done.
remote: Total 426 (delta 206), reused 414 (delta 194)
Receiving objects: 100% (426/426), 223.14 KiB | 0 bytes/s, done.
Resolving deltas: 100% (206/206), done.
Checking connectivity... done.
Git LFS: (0 of 45 files) 0 B / 24.60 MB
Post /idp/Authn/AuthMenu/menu;jsessionid=F21790870C138646A04F987CFB4FB6E3?conversation=e1s1: stopped after 3 redirects
Post /idp/Authn/AuthMenu/menu;jsessionid=F21790870C138646A04F987CFB4FB6E3?conversation=e1s1: stopped after 3 redirects
This cloned the repos but failed to replace the LFS-mangaed files with the full versions as shown by, for example:
[rabartl@crf450 DrekarSystemTests (master)]$ cat ./normal_tangent_bc/tangent_bc.gold.exo
version https://git-lfs.github.com/spec/v1
oid sha256:aa67419494db421f6ef43131c403b1c164d3001d5e4137e971c8448e81c7ad9e
size 237404
Anyway, not urgent but it would be great if we could get this to work.
Thanks,
-Ross
Looks like Git-LFS with SSH should work on the SNL GitLab servers. I need to try that out. See:
FYI: Git-LFS is working on the ORNL GitLab servers code-int.ornl.gov and code.ornl.gov.
But since this story has gotten way off scope and since Drekar is going to just use a raw git repo (or not change anything), there is no sense keeping this Issue open any longer.
If there is a desired to teach gitdist about Git-LFS, then we can open a new issue. But for now, it makes sense to close this story as wontfix I think.
For large binary files, we use svn to store system test data. When adding new capability, this results in adding system tests along with changes to the source code. We use gitdist to pull and push multi-repo changes. Multiple users have been burned because gitdist doesn't push the svn repo changes. It would be nice for gitdist to support some simple commands for updating/pulling and pushing to repos.