Git-Lit / git-lit

Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.
31 stars 8 forks source link

forking gitberg ? #12

Open dav009 opened 9 years ago

dav009 commented 9 years ago

Good initiative.

It seems the scope and objectives of this project are very similar of the ones of gitberg. should it be a fork of it ?

tfmorris commented 9 years ago

On the surface I'd say no because I don't agree with the premise, but perhaps you could point out the commonalities that you see in scope and objectives.

dav009 commented 9 years ago

As far as I understand both would create a dump of the documents using git repos, the main difference is the format they pre-process?. Just guided by the readme, the prototype took parts from gitberg.

Im willing to contribute either way, just figuring whats the more meaningful way before we start pushing ;)

sethwoodworth commented 9 years ago

Gitberg author here. I'm 100% willing to collaborate on the gitberg module and tool. The package is meant to be importable as a python library.

I've created a github issues milestone on gitberg for needed changes for git-lit. There is a thread on the GITenberg mailing list where I proposed hopping on a hangout and seeing how we can collaborate. Is there a git-lit mailing list? I will try to send out a doodle/invite for a public hangout in the next couple days on at least the GITenberg mailing list.

Of course, feel free to fork gitberg if it you need to, but hopefully we have enough overlap of goals that we can start creating standards.

JonathanReeve commented 9 years ago

Indeed, since most of the code is adapted from gitberg anyway, it makes sense to try to fuse the two codebases somehow. One way would be to abstract gitberg a little so that it can handle other book types--there could be a class BritishLibraryBook that is a subclass of Book, for instance. A public hangout sounds good! Looking forward to the invite. In the meantime I'll look around for a kind of chat service like Gitter or something. Suggestions welcome.

I think there is definitely a lot of overlap of goals, and I'm really interested in talking about standards creation.

tfmorris commented 9 years ago

The gitberg utility seems to do a little bit of everything, but the piece that was copied is mostly concerned with creating and managing git/Github repos (and that functionality might arguably be better implemented using modules like gitpython and PyGithub rather than shelling out to the OS). All the transformations are going to be different because the two projects have different source formats, different source metadata formats, different data sources (disk vs network), etc.

Perhaps it'll seem obvious when I dig into it more, but on the surface it doesn't seem like much would get reused.

tfmorris commented 8 years ago

I haven't looked at gitenberg in a while, but I plowed through all the git-lit code recently to advance the cleanup/refactor from iPython to vanilla command line Python and my conclusion is (still) that there's not enough commonality to make this worthwhile. The gitenberg contribution is a few dozen lines of code, many of which have things like hard-coded commit messages that needed to be changed.

While the concept is obviously completely inspired by gitenberg and there were a bunch of code snippets "borrowed," it doesn't make sense, at least to me, to try to keep them aligned (which is what a fork would imply).

Just one man's opinion... But, we should decide, one way or the other, because the more work that's done the harder it'll be to change the decision.

@JonathanReeve Your project, your decision.

JonathanReeve commented 8 years ago

Code usage aside, I'd love to for us to merge the two projects somehow, and have one command-line utility to parse and post all kinds of books, BL, PG, and more. I think if we can manage to make Git-Lit modular enough, with swappable reader modules, than we can be easily extensible to whatever text type/source there happens to be. One exciting development on this end is that we just got a fantastic collection of manually-edited texts from UVA that they're no longer maintaining. @elotroalex knows a little more about this corpus than I do, but it could be very exciting to expand into non-BL territory.

At the same time, there are probably lots of tricky aspects to merging code and text repositories. As someone mentioned a while ago (@tfmorris?), we'd need to find a way to dedupe PG/BL texts where we could. So I think this might be something to shoot for eventually, but not on the immediate radar, since the main priority remains getting the texts on GitHub to begin with. But soon(ish) I'd love to talk about ways to merge the projects.

JonathanReeve commented 7 years ago

Marking this as wishlist for the moment.