JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.79k stars 5.49k forks source link

libgit2 planning issue #7584

Closed StefanKarpinski closed 9 years ago

StefanKarpinski commented 10 years ago

We already have an issue for the feature of using libgit2: https://github.com/JuliaLang/julia/issues/4158. @jakebolewski created the LibGit2 package to wrap the libgit2 C library, and @andrioni has been working on libgit2 in base (https://github.com/JuliaLang/julia/pull/7339) based, I believe, on @lifeissweetgood's work on same (https://github.com/JuliaLang/julia/pull/4866). This issue is to discuss specifically what remains to be done. Let's plan!

tkelman commented 10 years ago

Soo, cmake. Easy enough on Linux and Mac. On Windows, there are several choices. I was able to successfully build the package using the standalone Windows cmake installer, but I do not want that on my path - the less things on my path, the better. So as I mentioned in one of those issues, I really want the location of the cmake executable to be a Make.user configurable variable (defaulting to assuming it's on the path, if unspecified).

There is a cmake available via pacman in MSYS2, but it installs a whole bunch of extra dependencies (ssl and various other things) that libgit2 then sees as available and links to - we should try to avoid pulling those in unless we really need them. And the installation location of that MSYS2 pacman cmake ends up under /mingw64, which interferes with the proper /etc/fstab method of mounting the install point for the MinGW compilers.

StefanKarpinski commented 10 years ago

Would it be possible to just bundle / download prebuild libgit2 libraries on each platform? Or maybe do the same for a minimalist cmake?

tkelman commented 10 years ago

Sure, in fact for Win32 and Win64 I already did. I guess the question is are we okay with not building everything from source and treating dependencies that move into base more like the way we treat packages right now? This would make life a little easier on Windows, except for the sap who somehow found himself building all the dependencies for everybody :wink:

StefanKarpinski commented 10 years ago

I think it's fine. It would be nice to also have a build option to download and build libgit2 from scratch that requires having a working cmake installed, but using a pre-built libgit2 binary by default is reasonable. The main reason we build from source for everything else is that we need to configure the hell out of most dependencies. I suspect we can just use a normal libgit2 library on each system.

dsfmt

quinnj commented 10 years ago

+100. I think as much as possible, we should do pre-built binaries on windows for dependencies. I think it would simplify getting up and running a ton.

tkelman commented 10 years ago

We'll probably have to configure at least a little bit out of libgit in the end.

When it comes to custom-configuring the rest of the dependencies and needing to upload new binaries whenever we patch them or tweak their configuration or upgrade to a newer version, it should be totally doable to automate that and mitigate the reproducibility/bus factor a bit. Either slightly abusing AppVeyor for any dependencies that take less than half an hour to build (everything but LLVM, OpenBlas, and probably FFTW), or if we get @staticfloat to put Windows on a couple of the machines in his up-and-coming Julia build farm. Or cross-compiling like we do now for the binary installers may be good enough.

Downloading binaries is a no-no in Linux distribution packaging land for understandable reasons, but Windows considerations are so far from there that if you're okay with diverging in the build process, then so am I.

staticfloat commented 10 years ago

Dependency-building sap, reporting for duty. Priorities for me right now are are: OSX Julia building, OSX bottle building, Linux Julia building.

I guess I'll just push!(priorities, "Windows Libgit Binaries"). ;)

tkelman commented 10 years ago

@staticfloat I already did those, but what I'm missing is more automated, repeatable, and official-seeming infrastructure than my laptop and a binary dump on Sourceforge. I know that kind of infrastructure was in your plans for Mac, if there's space for a couple of either WIndows or cross-compile machines in there then I can absolutely help get it all going.

jakebolewski commented 10 years ago

From the LibGit package end I don't think 32 bit builds for both Linux and Windows are working at the moment. I know there are issues with the 32 bit Windows build as AppVeyor supports this and I haven't tested yet on a 32 bit Linux install.

We need to figure out what transports we want to support. For https:// we will need to bring in openssl, for ssh:// we will need to bring in libssh2. Having libssh2 might be advantageous to have in base anyway as we use ssh for the multiprocessing stuff. I know @jiahao has plans for more involved clustering support which might need more involved ssh support than simply shelling out to the ssh binary like we do now. Also note that it will be important for the Linux packagers to use the latest release of libgit2. This will be a problem with distro's that do not move as quickly as others.

The last major missing piece of functionality in LibGit we are missing is merge support. This is new to the recent version of the libgit2 library. The only bindings that support merge to my knowledge are the Libgit2Sharp bindings so we should port over their tests and get this working to the extent needed to support Pkg. However, the orignal issues with the LibGit2 library are largely solved so I think we should be (relatively) good on that front.

I would caution that the performance of libgit as opposed to git is actually worse for many operations. The win we will have with libgit is not having to touch the filesystem as often by holding things in memory and by combining operations. A naive line for line port is probably not going to be much faster than what we are doing now (on linux at least, but I'll eat my words if this is not actually so). This will require a bit of thought and time to work through so its best that we push this forward soon as I expect we will have a long tail of bugs to work through moving this into base.

StefanKarpinski commented 10 years ago

I would caution that the performance of libgit as opposed to git is actually worse for many operations. The win we will have with libgit is not having to touch the filesystem as often by holding things in memory and by combining operations. A naive line for line port is probably not going to be much faster than what we are doing now (on linux at least, but I'll eat my words if this is not actually so). This will require a bit of thought and time to work through so its best that we push this forward soon as I expect we will have a long tail of bugs to work through moving this into base.

While I believe that git is faster than libgit2 for most slow operations, I suspect that most of the performance problems that Pkg has are that completely trivial operations that are fast in both git in libgit2 involve forking the entire Julia process and reading the output of that child process when using git but would just be C calls with libgit2. But yes, to get more performance gains, the Pkg code needs to be refactored significantly.

jakebolewski commented 10 years ago

Certainly true, I believe that there should be a win just by switching over. What I was refering to are the issues that have cropped up about how Pkg is slow on NFS and other distributed filesystems. We have the chance to be a _lot_ faster in those instances if we are a bit smarter about doing Pkg operations and not hitting the file system as hard as we are doing now.

I should also add that libgit2 will undoubtedly improve going forward so the performance gap will close over time.

StefanKarpinski commented 10 years ago

Oh, yeah. NFS is a complete disaster for these kinds of things (or anything, really). Making that better would be great, but not the immediate concern. We just want Pkg to be fast on normal file systems.

jakebolewski commented 10 years ago

Making it fast for NFS will make it insanely fast for everyone else. It's a good test case for potential performance issues but you are right this is more of a reach goal.

StefanKarpinski commented 10 years ago

Dream big, Jake. Dream big.

jakebolewski commented 10 years ago

With regards to CMake. It looks like most of the logic in the Cmake file is devoted to supporting Microsoft's C++ compiler. Couldn't we just write our own Makefile for Linux & OSX so we don't need to drag in CMake as a dependency on these platforms if we are using prebuilt binaries on Windows?

StefanKarpinski commented 10 years ago

Seems reasonable to me if you think it's plausible.

jakebolewski commented 10 years ago

I got most of the merge tests adapted form libgit2sharp passing. I still have some issues with merging detached heads but there should be more than enough support in there for Pkg. @andrioni it would be great if you could lay out what additional functionality, if any, you need for your project so we can work on it.

tkelman commented 10 years ago

Bumping this to ask a question. What do we want to do about SSL on Windows? I currently have it disabled in the Windows build of the libgit2 dep, but I think that means we might not be able to clone over https? OpenSSL is available in WinRPM, but we'd need to bootstrap the package manager enough to get that onto everyone's machine (like what we do with expat and zlib right now). Bootstrapping non-securely just to get OpenSSL might work, but might also be a bad idea? Not sure. I don't think we want to get into the potential mess of having to build OpenSSL from source in deps/Makefile.

Since it was easy and Jake's still doing some work out in the package, I also wrote a spec file for libgit2 on the opensuse build service https://build.opensuse.org/package/show/windows:mingw:win64/mingw64-libgit2, and in that version it's very easy to declare mingw64-libopenssl-devel as a BuildDepends. Again, there's a bootstrapping problem with relying on libgit2 from WinRPM for Pkg to work.

tkelman commented 10 years ago

Maybe we just re-implement a minimal build-time version of WinRPM in bash using xmllint --xpath to parse the repomd, or in Python or Perl, (or Powershell but that wouldn't work for cross-compile) and use that to grab binaries of libgit2 w/openssl, zlib, expat, and whatever else is needed to bootstrap Pkg and the Julia version of WinRPM on Windows. Might end up as a decent replacement for the win-extras target.

tkelman commented 9 years ago

Bump. How do we get started here? @jakebolewski what's the status of the package bindings? If you need help, please ask.

jiahao commented 9 years ago

Last I spoke with @jakebolewski about libgit2, the low level wrapping is essentially done; however, the library does not offer a drop-in replacement for git commands and they have to be manually rebuilt from low-level primitives.

tkelman commented 9 years ago

Closing this in favor of #11196