getodk / aggregate

ODK Aggregate is a Java server that stores, analyzes, and presents survey data collected using ODK Collect. Contribute and make the world a better place! ✨🗄✨
https://docs.opendatakit.org/aggregate-intro/
Other
74 stars 227 forks source link

Undo dependency on Git LFS #299

Open zwets opened 6 years ago

zwets commented 6 years ago

These four jar files are broken in the repository:

Running jar tf on any of these gives:

$ jar tf libs/gwt-google-maps-v3-1.0.1.jar
java.util.zip.ZipException: zip END header not found

This makes it impossible to build aggregate.

ggalmazor commented 6 years ago

Hi, @zwets!

This repo uses Git LFS for this kind of big-ish files. Can you check that you have installed Git LFS and it's enabled?

zwets commented 6 years ago

Aha! Gotcha. I now noticed it being mentioned in the README.md. Despite working with dozens of GitHub repos, I've never needed LFS before so didn't have it installed.

But why add the inconvenience of LFS? These libraries are tiny - even added up they barely exceed one MB! Git can perfectly well cope with those, even if they changed regularly. The days when git couldn't handle binary or large files are long gone, so why the extra complexity?

I work with repositories having binaries files of 100MB each, straight in git. Here's an example. No trouble at all, even over the unreliable and slow networks we have here in Africa.

Maybe retitle this issue to "undo dependency on git LFS"?

ggalmazor commented 6 years ago

Thanks for your feedback, @zwets!

I agree that LFS is a bit of a nuisance. Some months ago we restructured this repo and we had to decide to either throw away all the git history and files up to 2018 (basically squash everything into a commit) or find some way to deal with it. You can check the discussion in #128.

Since we still don't have a perfect understanding of all the codebase, we decided to use LFS to have better traceability on code changes, which could be key to understand the historic whys and hows.

Thanks to LFS, we reduced the size of the repo from 700MB to less than 300MB which is good enough at this point.

zwets commented 6 years ago

Have retitled the issue, as I would consider getting rid of the inconvenience of LFS a useful feature. For one, it makes the path to submitting a PR smoother.

(As would a build instruction that doesn't start by recommending the installation of a IDE, BTW. :-))

ggalmazor commented 6 years ago

Sure, let's discuss this :)

Regarding the IDE, as far as I know, this project has always prescribed one. Doing this is helpful because it sets a common ground to enhance collaboration.

All the JVM based ODK projects prescribe IntelliJ Community Edition or Android Studio, which are free (and great options too IMHO), although you're free to work with whichever IDE of your choosing.

I'd also like to note that coupling with the IDE has improved too since 1.4.15. Before that, this project had a build process that relied on Eclipse (meaning that you were required Eclipse to build it). Now, the build process uses Gradle which you can launch from the command line (no IDE required).

Having said that, I think it would be useful to have instructions for other IDEs (or no IDEs) as well. We want as many people contributing as we can, and the IDE should not be a stopper. It would be great if you could write a new doc under /docs with instructions to work on Aggregate without an IDE or your non-IntelliJ IDE.

ggalmazor commented 6 years ago

So, my two cents regarding the main issue about removing LFS from this repo.

I'm in favor of removing LFS but:

At this moment, I can't really provide any other alternatives, but we could track those in this issue as they surface for future reference.

I'm tagging @yanokwa, @lognaturel, and @dcbriccetti to get their feedback on this too since they're three of the most active members of this part of the community.

yanokwa commented 6 years ago

Great discussion, everyone!

Much of what we do is to make the Aggregate codebase and the rest of the ODK codebases easier to maintain and easier to contribute to. To that end, I don't think we should remove Git LFS and I think we should continue to default to IDE based instructions.

Git LFS makes the repo smaller and allows us to retain history which I think is critical to maintaining Aggregate. To me, it's a pretty low bar to install when compared to what you need to do to get Aggregate built.

You might say that the binaries are small, but we don't use Git LFS because we need to store binaries. We use Git LFS because we need to store a history of those binaries and that size adds up! The Aggregate used to be 700 MB before we made this switch and that's miserable for CI and slow connections.

Long story short, I don't think Git LFS is a nusiance. I think we document it pretty well in the README. Maybe we can make that documentation better?

As to IDEs, I agree with everything @ggalmazor said. I welcome PRs that let folks build Aggregate from whatever environment they are most productive in. But the default should be IntelliJ IDEA because that's what the maintainers use and can support.

zwets commented 6 years ago

Regarding the IDE instructions: what I meant to say was that the README.md starts out by recommending the installation of an IDE, whereas I'd expect "how to build" to precede "how to develop".

What's worse, it says "Aggregate is built using Gradle and Gretty, but we strongly recommend you use IntelliJ IDEA first to ensure everything is working". That instantly put me off. My thought was: "What? That's going back to the days even before ant (yes, I'm that old), when you could only build projects from within an IDE!".

Those days were dreadful: spending hours to just get a project to build (and that excludes installing and finding your way around an unfamiliar IDE), with no predictability or reproducibility. This is precisely why ant came about (and later maven, ivy, gradle, etc.).

A project should before anything build with a one-liner executed from the command line. IDE's are for hacking code, and setup instructions for them are nice to have, but the thought that installing a specific IDE could affect the build even in the slightest, makes me run away in horror. :)

zwets commented 6 years ago

With respect to Git LFS (and the IntelliJ recommendation, in fact) these are my 0.02:

If contributing a patch to a project requires me to install (and configure, and learn) new tools that I'm not going to use for anything else, then the effort involved must be small relative to the actual work that goes into the patch, or I'll just pass on the opportunity. Doubly so when these tools aren't in common use, because (a) there's a reason for them not being popular (e.g. buggy, complicated, redundant), (b) there are downsides to using uncommon tools (disinvested learning time), and (c) the project using them is likely to have other quirks and non-standardness - implying more wasted time.

For ease of entry of contributors, the main thing is to be as standard as possible. You want people to be able to just clone and build the first time around.

Git LFS isn't a big one in terms of effort or time, but I can't know that ahead of time. Also, as I have never encountered it, even in huge repos, there's the quirkiness factor: why are they doinig this? So, let's say I estimate my "retooling time" at one hour. Then if I my patch is less than a day's work, that's 10% wasted, and pretty close to my limit.

IntelliJ is the same story, but with a very different "breakpoint". I've developed Java code on and off for 20 years. I now do Java infrequently, and when I do, I use Eclipse, usually in combination with Maven, but Gradle is fine too. Now, if Aggregate were to require IntelliJ, then my breakpoint for contributing would surely not be lower than at a month's worth of work, probably more. It doesn't require it, but as the project stresses that it strongly recommends IntelliJ even for making the build work, then this makes me suspect that there will be all sorts of IntelliJ assumptions in the organisation of the files, and I'll be spending a lot of time to make it work in my setup. So, the the bar might perhaps be at a minimum of 1-2 weeks work?

It's quite simple: a project that leaves the choice of IDE open (perhaps mentioning in passing that the core team uses IntelliJ, and here-and-here are our setup instructions, but such-and-so it'll work in Eclipse) is more open to contributions, and certainly fishes in a much larger pond of developers. As Eclipse and IntelliJ both have about a 40-45% market share, that's doubling your pond in one go.

And FTR I wrote my PR using nothing but vi (and gradle war of course!) :-)

ggalmazor commented 5 years ago

Hi, @zwets!

I've reviewed the language used by the README.md and other docs, and I'm happy to report that, although we still give directions in order to set up a development environment with IntelliJ Community, the main build&run workflows have been described using exclusively the command line, which, I believe, is the most tool-agnostic option we have right now.

Once #390 is merged, it would be great to add directions to set the dev env with Eclipse as well. Would you be interested in contributing it?