Open kmalakoff opened 9 years ago
Let me work backwards from where I am today. It's been a couple years since I've worked on the original js-git. I recently ported lua-git to use duktape javascript and an example can be found at https://github.com/creationix/dukgit/blob/master/test.js#L58-L64
db = mount(".git");
var queue = ["HEAD"];
while (queue.length) {
var commit = db.loadAs("commit", queue.shift());
p(commit);
queue.push.apply(queue, commit.parents);
}
But this does require duktape coroutines which will not work in any other JavaScript VM. Once async/await lands I can port to that pretty easily.
The lua-git has a similar API at https://github.com/creationix/lua-git/.
JS-Git was my first API in this regard and is very modular because of the unique constraints I was trying to solve. The README for https://github.com/creationix/git-node-fs is the simplest way to get started with js-git and node. Use that to mount a repo.
var repo = {};
var path = path.join("some/bare/repo.git");
require('git-node-fs/mixins/fs-db')(repo, path);
Once you have a repo, loading commits and walking them is pretty easy using direct repo.loadAs(...)
calls like I do in dukgit. https://github.com/creationix/js-git#basic-object-loading
But there are also helpers for some common tasks like iterating the history or a file tree as a linear list.
https://github.com/creationix/js-git#using-walkers
So a complete example showing how to walk history and files would be:
var path = require('path');
var run = require('gen-run');
// use gen-run to block on contunuables (mini promises) using ES6 yield.
run(function*() {
// Create a repo object
var repo = {};
// Mixin the base DB operations using local git database on disk.
require('git-node-fs/mixins/fs-db')(repo, path.resolve(__dirname, "../.git"));
// Mixin the walker helpers.
require('js-git/mixins/walkers')(repo);
// Look up the hash that master currently points to.
var commitHash = yield repo.readRef("master");
// Create a log stream
var logStream = yield repo.logWalk(commitHash);
// Looping through the stream is easy by repeatedly waiting on `read`.
var commit, object;
while ((commit = yield logStream.read())) {
console.log(commit);
// We can also loop through all the files of each commit version.
var treeStream = yield repo.treeWalk(commit.tree);
while ((object = yield treeStream.read())) {
console.log(object);
}
}
});
Hi Tim,
Thank you for the quick response! I really appreciate you providing a list to resources and the example (plus, I'm a big fan of your work and podcast appearances!).
I think that my problem is that the API is that it is very low level so there a gap between what I want to do and knowing how to do it. I would really like to find examples of performing common operations like here but for js-git.
Because I do not understand the structure of a git repo or how to use git at a low level (I've used a small number of CLI commands and GUI tools), it makes me feel like I have three options for using js-git:
1) first port a higher level API (like nodegit) to use js-git as the driver - this way I can study how they walk things and perform operations
2) find someone to port the nodegit examples to js-git
3) find a programming guide to using git - I'm not sure if there is a reference that someone can recommend, but it would basically be a cookbook to perform common operations (eg. checking status, etc)
I guess I just need help connecting the dots between the low level API and what I would like to accomplish.
Currently js-git is basically just git core, this is simple and you can learn it very quickly. There are 4 kinds of objects, they are commit, tag, tree, and blob. Tags are used for annotated tags. You'll find them used by some projects for release tags. The tag ref will point to a tag object instead of a commit object. The tag will contain metadata about the release and link to the commit.
https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
Commits point to a tree object and zero or more parent commits (merge commits typically have two parents, but there can be more!). They also have extra metadata like tags do.
A tree contains a list of pointers to other trees, blobs, or commits (submodules). Each entry will have a name, mode, and hash.
A blob is just raw binary data.
So using repo.loadAs
you can manually walk the tree looking for things.
JS-git doesn't have support for the working directory so you can't do things like git status
. Also I haven't implemented diff yet so anything involving that can't be done including merges. The network actions like push and pull aren't done either. The main thing you can do currently with js-git is read/write the core git database directly.
Excellent. Thank you for explaining and the reference materials. This is much clearer now.
Are you going to be doing another fund raising project to add git status, diff, etc?
I would like to take another stab at this, but I'm in a different situation now. I don't need money anymore, I have a well paying salaried job. My current limitation is time since any time I spend on js-git will take away from time with my children. That said, there are some nights I can't sleep because of a mild sleep disorder and I will often work on stuff while they sleep.
As far as technical direction, I learned a lot from the failure, that was the js-git fundraiser. My goals were too ambitious and the browser tech wasn't ready for what I was trying to make.
Today, we have async/await coming soon and transpilers that let you use it now. Also with service workers coming along we have much better offline support.
As far as I know, browsers still have no way to make direct TCP connections (for git://) to git servers and no git hosting company has added CORS headers to their smart-https (https://) endpoints. I've designed a great websocket based git protocol (wss://) that can be easily deployed if you host your own git repos or could be setup as a proxy to other git servers and would love if it became used eventually by hosting services like Github.
Initially I focussed on browser-based apps using only native browser I/O and direct connections to services (which is still impossible), but if we relax the requirements a little and allow proxies for browser apps, custom hosted git services, or node.js or chrome app clients, then the network constraints are no longer an issue.
I remember hearing you mention about your fund raising experience and the lack of direct access to services like Github on one of your podcast appearances. It is great that the passing of time is helping address some technical issues (like async/await).
I'm working on a project with electron so focussing on the node.js ecosystem would be a good scope for my needs. I need to basically write a git client so diff, status, remote sync, etc are on my needs list. I also played around with the browser, but because my project's main, initial use case is the user's hard drive, I am fine to deprioritize the browser for now (but want to leave the door open for it which is why js-git is so appealing!).
Loosening the requirements to proxy git requests is a fine compromise although I'm not sure if it puts other limits around git over SSH, for example. Unfortunately, my knowledge is a bit shallow on network and protocol considerations so I cannot add much there.
I need to basically write a git client so diff, status, remote sync, etc are on my needs list.
@kmalakoff is there a github repo for this being developed? I'm interested in seeing how this is done without git bindings for node.js
@kmalakoff technically speaking, ssh is no harder than git://
or http://
since they all require a TCP connection which browsers don't have, but practically speaking, implementing ssh on top of tcp is a lot more work than implementing just git://
or even http://
.
Implementing https://
is simply http://
over TLS. I've implemented HTTP in pure js and lua multiple times, that isn't hard, especially for the reduced use case of git clones.
There is a pure js implementation of tls that I've been using at https://github.com/digitalbazaar/forge, also I've been recently discovered https://tls.mbed.org/ which can probably be compiled to JS using emscripten. My experience with emscripten, however, is that it generates massive JS blobs (hence the primary motivator for web assembly).
I believe that implementing enough of ssh to do a git clone over it isn't that bad. Also on the node side, you have the openssl bindings available and can use something like https://github.com/mscdex/ssh2.
Actually, thinking about it, about the only use case where you'd be interested in reading ssh keys from $HOME/.ssh/
is on a full developer workstation running Linux or OSX. In those cases you have access to node or are probably using electron if writing a graphical native app so still have node access.
While it is possible to generate ssh keys in a browser or chrome app and store them in some persistent storage, I doubt it will be the default workflow for most people. I would expect https be used more in those use cases.
@Dashed I'm just doing the minimal work for a proof of concept so really messy, half-baked code...not really in any useable / sharable form. If I figure out a good path, I'll keep you posted.
@Dashed last night I looked into implementing status with js-git (again still in an experimental way).
It looks like it is implemented like:
1) resolve the HEAD reference to an actual reference commit hash and collect the entries of type blob 2) traverse the filesystem in a way that ignores the .git folder and respects .gitignore to collect the hashes for each file 3) compare the results of 1) and 2) in two ways: a) by path looking for hash changes, b) by hash looking for renamed paths.
Also, I think there could be come caching in there to reduce rehashing checks based on modification checks.
The problem I immediately run into is that js-git's mixins virtualize git but do not virtualize the file system so writing a general-purpose implementation would require API additions. It makes sense that you would want to virtualize git so you do not need to emulate a filesystem-like interface into each type of storage, but then it seems like you might want to virtualize the filesystem anyway to implement things like status.
@creationix two ways to implement this come to mind:
1) virtualize the filesystem and rewrite all of the mixins to the new interface - I'm assuming that you probably think this is a little crazy given the benefits of virtualizing different types of storage methods and work that has been done to date. This is the path I've personally been experimenting with by emulating Node's fs module for a memory representation, eg. virtual file system.
2) add some sort of interface like walking the filesystem - it would need to respect .gitignore and I've found a module gitignore-parser that helps with testing files.
@creationix given that you've written a bunch of drivers for different git implementations, with your current state of knowledge and ignoring sunk costs f what you have already implemented, what do you think is the better approach...extending the storage driver mixin APIs, emulating the filesystem, or something else? (since it seems like a slippery slope in any approach)
Also, it looks like .git/HEAD stores the working directory so in the filesystem case, it isn't a barrier to the implementation although maybe the problem is that you are referring to is a general purpose solution would need to extend the storage driver mixin API to store and modify those values and eventually the refspecs.
I'm very much still climbing the git learning curve so I might have got some things wrong here!
The real git clients uses an index file to cache data for status, see the format at https://github.com/git/git/blob/master/Documentation/technical/index-format.txt
The issue I had before with the index file approach was that not all the filesystem backends I was targeting supported fine grained stat data. (Chrome's FS API for example)
It's hard to abstract because some backends will want to use the git index format while others will have to invent something custom to their abilities.
I can see how given the index spec status is a problem to solve in a a general purpose way. That said, index sound emulatable on platforms like you say.
As for the API question to scan the filesystem, if you could solve the index problem in a general way, how would you see the API for js-git evolving? I realize that it is a little controversial for me to ask about abstracting in a different way (eg. by file system), but I'm on the edge of deciding what to do and would like to get your opinion since you have already thought about this and can foresee things that I'll encounter the hard way!
Specifically:
1) how would you abstract traversing the file tree to check the hashes for each file? 2) how would you abstract concepts like the working directory in HEAD, refspecs in config, etc?
I guess I'm thinking that by abstracting by filesystem instead of git concepts allows filesystem drivers to be developed and tested like black-boxes, and allow for one codebase to do all of the above like updating HEAD rather than the other way around. I definitely see the benefits of both, but would like to hear your take on how you would approach it in the current API as the feature set increases.
Thank you!
So the two-level abstraction is what I've planned to do all along. The higher level abstraction will consume the lower-level abstraction. In this case, there will be something for working checkouts and it only needs to know if a file has changed, the git index format on disk is an implementation detail of one of the backends. But many platforms can offer a good FS level implementation and for those, it would be good for js-git to bridge the gap between fs and working directory for you.
I already do this with git-fs which implements the js-git database interface on top of a generic file system interface. You don't have to use git-fs if you're on some strange platform where it doesn't make sense.
For example, I had a backend that stored objects and refs in indexeddb in the browser without first emulating a filesystem. That would be unneeded overhead and complexity.
So to answer your question, the git-fs module needs to implement the needed high-level APIs for doing status, add, checkout and other working directory commands. We also need to design what exactly those APIs are, but not hard-code them to the filesystem. Users of js-git will be able to choose implementing the high-level interface directly or just implementing the low-level fs API and using git-fs to bridge the interface gap.
Sorry for the rambling, I've been sick this weekend and don't have a lot of brain power left. But I hope this directs you in the right direction. Feel free to propose added APIs as needed in the high-level space.
Excellent!
If I understand correctly, the high-level API should be like a database that then is ported to the file system rather than emulating the filesystem.
I've been noodling on a querying-based idea as a way for my own git API as a higher level of abstraction to provide an interface that requires no knowledge of the git structure...when I came up with BackboneORM and BackboneREST, I made a general purpose, serializable query interface similar to MongoDB (but simpler) that was ported to each driver and could even be used in the browser's query parameters. I'm sort of used to just using higher level abstractions like sort
, limit
, offset
, pick
values
, in
that dealing with traversing data structures directly seems like a lot of detail to be thinking about. Digression over.
I found https://github.com/creationix/git-fs-db. It this looks like a database-like interface into the filesystem. I think git-fs-db
is the one you are referring to by git-fs
above. Also, I found https://github.com/creationix/git-chrome-local-db which seems to have a similar interface so I think this is the backend that stored objects and refs in indexeddb in the browser without first emulating a filesystem
example.
However, I'm missing the piece that actually uses the higher-level interface since these modules look more like drivers to be consumed somewhere else. I've looked in https://github.com/creationix/js-git/tree/master/mixins and in some of your repositories to the module that uses them.
Yes, I think you're right. Sorry about getting the names wrong (it's literally been years since I worked on this stuff).
Most the mixins and programs use the db interface directly, there isn't much in the way of super high-level APIs yet (no merge, diff, blame, per-file history, etc). The tedit project implements a FS on top of the js-git db interface (https://tedit.creationix.com/ https://github.com/creationix/tedit) The
I am giving a talk in Paris in a month (http://www.dotjs.io/) and hope to be using JS-Git for a some of it. This means I'll have some time to use this code myself and hopefully get things a little farther along.
I've been Googling around to find examples of how to use js-git to perform common operations, but they don't seem to be coming up. I've trolled through lots of links in various js-git -related libraries, but I'm still having problems getting started.
I've been experimenting with nodegit and they have a good examples folder for common operations. For example, I want to load a .git folder in Node.js and explore the commit history on the master branch like here and I would like to check the current status of the repo. I cannot seem to find how to get started in doing something similar with js-git and git-node-fs.
Any pointers would be much appreciated! It could be a project that makes extensive use of js-git or common examples or usage docs.