chen870647924 / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Guava JAR is HUGE #605

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Google Collections 1.0 is ~600 KB

Guava r09 is 1.1 MB.

Please split the library into smaller pieces (at least two, -core and -extras)

The core should contain only the most frequently used classes and it should be 
around 100-300 KB. The extras contains the rest of the stuff.

Original issue reported on code.google.com by ceefour666@gmail.com on 14 Apr 2011 at 8:03

GoogleCodeExporter commented 9 years ago
If you're worried about Guava increasing the size of your application, the 
suggestion is that you use ProGuard as part of your build process to create a 
jar that strips out everything you aren't using. See 
http://code.google.com/p/guava-libraries/wiki/UsingProGuardWithGuava

Original comment by cgdec...@gmail.com on 14 Apr 2011 at 8:25

GoogleCodeExporter commented 9 years ago
Thanks for the comment.

While ProGuard may be useful for applications, I think it's not applicable for 
framework/library type projects, like ModeShape.

We're also trying to reduce/eliminate the risk of classpath conflicts.

Here's the discussion: https://github.com/ModeShape/modeshape/pull/69

Original comment by ceefour666@gmail.com on 14 Apr 2011 at 8:31

GoogleCodeExporter commented 9 years ago
Yeah, the page I linked does recommend not doing this for libraries. It should 
be up to the applications using the libraries what they do with them. Is the 
worry that users will avoid your library because of the size of the Guava 
dependency? As long as you avoid @Beta APIs, I think you should be fine 
otherwise.

I know that for one release (r03) there was a separate jar for each package in 
addition to the jar with everything, but they decided not to continue doing 
that.

At any rate, I'm not someone who can make any sort of decision about this... 
maybe someone else will have some comments.

Original comment by cgdec...@gmail.com on 14 Apr 2011 at 9:25

GoogleCodeExporter commented 9 years ago
We considered this very carefully when we started Guava, and we believe that a 
single JAR, with a recommendation of ProGuard to size-sensitive applications, 
is the way to go.  The key realization was that even if we split into 10 
separate packages, most users are still going to use 15% of this package, 5% of 
that one, etc., and would still benefit from ProGuard just as much!  We would 
add a lot of administrative overhead for no real benefit.

We have some more documentation that gives advice on how to depend on Guava 
from a library, which I believe Charles is planning on externalizing for you.

Original comment by kevin...@gmail.com on 19 Apr 2011 at 2:07

GoogleCodeExporter commented 9 years ago
Issue 1087 has been merged into this issue.

Original comment by wasserman.louis on 28 Jul 2012 at 8:31

GoogleCodeExporter commented 9 years ago
It isn't the size of the jar but the visibility of unneeded packages that is 
concerning. We have a large development team, and one way we keep everyone on e 
right track is to control the jars they have access to. Event Bux and the io 
classes are unhelpful in our product, but we really like collections. If guava 
was intelligently decomposed a bit you would find more people interested in 
using what they need without worrying about API creep. 

Original comment by m...@re-entry.ca on 1 Aug 2012 at 3:45

GoogleCodeExporter commented 9 years ago
Agree with this, esp #6.

Collections is the most useful part of Guava.

Original comment by ceefour666@gmail.com on 1 Aug 2012 at 3:50

GoogleCodeExporter commented 9 years ago
This got discussed in the Hangout with the whole team yesterday 
(http://youtu.be/rkjW-zwZhJQ?t=31m18s), which provides some more discussion.  
(tl;dw: Most Guava users end up using a few features from many different 
packages.  Relatively few users *only* use c.g.c.collect, or *only* c.g.c.base.)

There's also the issue that c.g.c.collect depends on a bunch of the other Guava 
packages, too, especially base, math, and primitives.

If you're *really* that fussy about what's accessible from Guava, ProGuard can 
also strip out everything in Guava except common.collect and the classes (not 
just the packages) that it actually depends on.  I whipped up a ProGuard 
configuration file to do that in about five minutes; it's attached.

Original comment by wasserman.louis on 1 Aug 2012 at 4:14

Attachments:

GoogleCodeExporter commented 9 years ago
Issue 1087 has been merged into this issue.

Original comment by wasserman.louis on 1 Aug 2012 at 4:16

GoogleCodeExporter commented 9 years ago
If you look closely at what your company needs and doesn't need, I strongly 
doubt that it divides cleanly along package boundaries.

Also, anyone who wants to maintain a sliced-up version of Guava is always 
welcome to do so; I think they'll find out it's a lot more painful than it's 
worth, but I'm not stopping them!

Original comment by kevinb@google.com on 3 Aug 2012 at 4:33

GoogleCodeExporter commented 9 years ago
Issue 1329 has been merged into this issue.

Original comment by cpov...@google.com on 11 Mar 2013 at 3:34

GoogleCodeExporter commented 9 years ago
Hello everyone. 

Well I did a small proof of concept and managed to create a set of atomic jars 
for Guava 14. Take a look.

    https://github.com/jjzazuet/seeds-libraries

As far as I can tell, I am not experiencing any kind of pain after a 3 hour 
process. The code divided cleanly among package boundaries. A testament to a 
good design I guess :).

Before anyone rage jumps on me this is only the raw, unmodified, java source 
code of the core libraries.

Now, if I know Google, they won't be switching to Gradle as a build system *any 
time* soon. This approach works for me and my current project so I could *in 
theory* volunteer to maintain these atomic jars for Guava.

Question is, is anyone still interested in having these atomic jars at all in 
Maven Central?

So, oh mighty Google, art thou interested in such tribute? :P

Thanks again for your time and help.

Original comment by jjzaz...@gmail.com on 11 Mar 2013 at 8:02

GoogleCodeExporter commented 9 years ago
As noted on issue 1329, we've painted ourselves into a corner by releasing the 
monolithic jar: Since any new, non-monolithic jars would be separate artifacts, 
Maven wouldn't know that guava-14.0 and guava-base-15.0 are incompatible with 
one another, and users who inadvertently mix the two are likely to see runtime 
failures. That keeps me from offering an endorsement for the Seeds project 
(aside from the name ;)), but certainly users can accept the risks if they'd 
like, perhaps if they're part of a small dependency graph.

(As you noted, our packages currently have a dependency graph with no loops. 
We've considered changing this, but we haven't gone through with it yet. You're 
safe for at least a while.)

Original comment by cpov...@google.com on 11 Mar 2013 at 8:08

GoogleCodeExporter commented 9 years ago
I would think deployments will be able to sort out their mixed-version jar 
problem just fine. We have to do that anyways when considering what jars are in 
the dependency graph. It is not unusual to exclude old version of jars in 
transitive dependencies. 

In other words, you aren't painted into any kind of corner. Guava can be 
decomposed into smaller, more consumable units. Thank goodness you don't have 
circular dependencies!

Original comment by m...@thebishops.org on 12 Mar 2013 at 2:25

GoogleCodeExporter commented 9 years ago
We do still hear from people who have both guava and the old google-collections 
on their classpath without realizing it. (We find out only because they post 
cryptic errors on StackOverflow and we recognize old versions of classes.) I 
wouldn't be surprised if we hear from someone with both guava and seeds-base 
somewhere down the line. The unfortunate thing in both cases is that tools 
can't identify the incompatibility for us (to the best of my knowledge -- if 
someone knows better, please enlighten me!). Once it's identified, certainly 
users can do as you suggest.

(No promises on the circular dependencies :) There would be some advantages to 
being able to use classes like ImmutableList and FluentIterable from 
common.base.)

Original comment by cpov...@google.com on 12 Mar 2013 at 2:49

GoogleCodeExporter commented 9 years ago
I'm looking into conflicting versions detection, Chris.  I know of no specific 
metadata that could make explicit an implicit conflict between different 
artifacts, but I'm following up.

And that's the key issue, as Chris said... it's not like guava-13.jar and 
guava-14.jar deps existing - that's handled in the dependency graph analysis.  
It's that there is no signal that guava-base-14.0.jar and guava-14.0.jar are 
mutually exclusive (for the fraction of their class file contents that overlap) 
in the maven metadata.

So yes, it is definitely not a new problem - other teams have dealt with maven 
dependencies and care and feeding of their dependency graph, but making a 
change that doesn't force a built-time breakage so people are forced to see it 
and fix it is something that goes a bit against the Guava team's grain.  It may 
be worthwhile in some cases, but it really has to be worth the risk of our 
customers pushing erroneous binary packages. 

Original comment by cgruber@google.com on 12 Mar 2013 at 2:54

GoogleCodeExporter commented 9 years ago
@cgruber Crazy idea here. If the real deal breaker for end developers is the 
inability to signal incompatible artifacts inside the classpath (e.g. guava-14 
vs guava-base-15) then why not change the package names to something like 
com.google.seeds.base (or something you prefer) so that the compiler raises 
errors for code using the monolithic Guava?

I mean, as an end user/developer I'd certainly grudge a bit for having to fix 
the compiler errors after upgrading to guava 15, but at least I'd know I have 
the option of choosing atomic packages.

In other words, that would introduce an API breaking change which would signal 
the start of atomic packaging.

Does that make sense?

Thanks for your time.

Original comment by jjzaz...@gmail.com on 12 Mar 2013 at 6:08

GoogleCodeExporter commented 9 years ago
One option would be to make guava-15 just the base classes (as opposed to 
putting those in a guava-base-15 artifact). Clients would need to know to 
include the other jars as they needed them, but we wouldn't be seeing duplicate 
class issues.

Another option would be to make an "empty" guava-15 jar which the other guava 
jars could depend on; the big drawback here of course is that while an empty 
jar is lightweight, it's not free.

Another route that might work would be to have the pom for guava-15 include a 
relocation section rerouting to guava-base; 
http://maven.apache.org/guides/mini/guide-relocation.html has more information 
on this option.

Finally, it's worth noting that the maven-enforcer-plugin does allow a check 
for duplicate classes. However, since most people won't be using this check, it 
doesn't help much from a support point of view.

Original comment by ian.b.ro...@gmail.com on 3 Apr 2013 at 10:03

GoogleCodeExporter commented 9 years ago
Hi guys. 

In case anyone's still insterested, I took the plunge and uploaded atomic jars 
for release 14.0.1 at Maven Central. Here are the relevant links: 

http://seeds.tribe7.net
http://search.maven.org/#search%7Cga%7C1%7Cseeds

I gave some thought to the potential classpath conflicts and in the end I 
decided to fork from Guava and rename the package structure. Hopefully this 
will not introduce issues but in any case, let me know.

It pretty much works for me at the moment and hopefully it will for someone 
else.

Anything else, let me know.

Thanks for your time and help!

Original comment by jjzaz...@gmail.com on 15 Jul 2013 at 3:33

GoogleCodeExporter commented 9 years ago
Issue 1594 has been merged into this issue.

Original comment by kak@google.com on 28 Nov 2013 at 1:15

GoogleCodeExporter commented 9 years ago
I can't believe the main reason of not splitting up Guava into smaller JARs is 
because there's the risk to have a "guava-base-15" artifact not matching 
"guava-14" for version conflict resolutions!
There are other examples of libraries which were split over time (I can think 
of Spring, or Hibernate, for instance), developers are used to handle such 
cases. Also, some suggestions on how to fix these problems were already 
mentioned. The best for me would be to provide an empty "guava" POM (with no 
JARs) which depends on all the other sub-packages (I think it's the opposite of 
what was suggested in #18), so that if I require "guava" all the other 
artifacts are automatically included, otherwise I can just choose the ones I 
need.

Maintaining a splitting by ourselves using ProGuard is not a viable way if you 
consider that we have to manage a codebase of 200+ external dependencies... if 
we had to follow this advice for any library that we also need to deploy to the 
client (which might be using slow connections) we would die... especially as 
soon as we need to upgrade one or more of the split up libraries.

Original comment by mauro...@tiscali.it on 28 Nov 2013 at 1:42

GoogleCodeExporter commented 9 years ago
Let's suppose a lot of people split Guava themselves: you obtain a lot of 
projects that will tend to crystallize their guava dependency to an old 
custom-shrunk version, cause upgrading to a new one would potentially be a 
PITA. Is it really what you want?

Now suppose at a certain point I have the shrunk jar published into my custom 
maven repo with the original id (I use Gradle as a build system, but I guess 
the same stand for every build system including some dependency management 
features). The dependency manager would resolve Guava dependencies with the 
shrunk jar even for 3rd party dependencies that could potentially need 
additional classes: that would be a problem.
OTOH if I publish the shrunk jar with a different id (a custom 
group-artifact-version) then I could hit some duplicate class issues (the ones 
you want to avoid): that would also be a problem.

So there are no feasibility problems in using ProGuard, yGuard and so on, but 
in this case it would be perceived as a workaround cause IT REALLY IS a poor 
workaround.

I think this is an issue where the safer approach is not necessarily the best 
over the time.
Guava is a great library, and it is really a pity continuing to limit its 
adoption on my projects just because it lacks some packaging refinement.

So please consider reopening this issue.

Original comment by davide.cavestro on 28 Nov 2013 at 3:15

GoogleCodeExporter commented 9 years ago
+1 to comments #21 and #22.

Comment #19 talks about a fork born with the only purpose to better support 
developers in embedding the library, this should warn you about the correctness 
of your decision.

One more question: is Guava going to increase or decrease in size over time? ;-)

Sooner or later you'll have to split it, isn't it better doing it now?

Regards,

Original comment by marcot...@gmail.com on 28 Nov 2013 at 3:36

GoogleCodeExporter commented 9 years ago
@mauromol@tiscali.it:
How much manual splitting is involved? How often does Proguard do the wrong 
thing when pointed at a project and told to include what is necessary? And is 
this something that needs to be done for each of the 200 dependencies? My 
understanding was that only one configuration and one Proguard run was required 
no matter how many dependencies there were.

@davide.cavestro:
I'm unclear on why a custom shrunk version of Guava would be put into a Maven 
repository. Isn't the idea that Proguard is run on the final project output, 
rather than on each of its input libraries?

Original comment by cpov...@google.com on 2 Dec 2013 at 10:00

GoogleCodeExporter commented 9 years ago
I have no doubt that ProGuard is a great obfuscator/shrinker, but - as every 
piece of software - it has some known limitations ( see 
http://proguard.sourceforge.net/index.html#manual/limitations.html ) and 
possibly some bugs. Also - as every obfuscator - it brings to the build system 
some additional complications, such as defining entry points, potential issues 
related to reflection and so on.

So - when possible - I prefer using explicit dependencies declarations in order 
to maintain control over the 3rd party code our developers may depend on, hence 
reflecting the code availability directly within their IDE (instead of removing 
from project output at build time unneeded code that they see as available when 
coding). It's simpler and safer.
I think shrinking is great when you need to further reduce the size of properly 
packaged libraries (in that case it makes no sense splitting them up further or 
even asking someone to do so).
IMHO so far Guava is packaged as a monolith and could be packaged is a better 
fashion. Hence I'm trying to make you aware of these scenarios :-)

Thanks for your consideration (anything you decide)

Original comment by davide.cavestro on 3 Dec 2013 at 8:21

GoogleCodeExporter commented 9 years ago
@cpov...@google.com: I do not even take into consideration to apply Proguard to 
the whole codebase, as this consists of classes that are or may be called in a 
variety of ways (direct invocation, reflection, even remote class loading). 
When you have a complex project (and not a simple HelloWorld application), I do 
not think it's wise to force concepts like compile vs runtime dependencies just 
to apply workarounds to handle cases like this, it's too risky and hard to 
maintain.

This is what I wanted to say: it's not desirable to treat Guava as a special 
case, because there's no reason for which Guava is "better" than all the other 
199 dependencies to justify such a special treatment. I still believe a better 
modularization for Guava would be desirable, especially if there are no strong 
reasons for not doing it.

Original comment by mauro...@tiscali.it on 3 Dec 2013 at 11:52

GoogleCodeExporter commented 9 years ago
OK, thanks. Most of the team's knowledge of Proguard is secondhand, so we hear 
some good things and some bad things, and we don't know how to weigh them 
against one another. Additionally, most of that knowledge is with Android, 
which I suspect is more likely to have a single entry point (and perhaps less 
reflection in general) than a typical app.

Hearing the feedback here, my personal main reservation to splitting Guava 
(well, on top of the possibility of conflict between guava-n and guava-base-m) 
is that the bulk of the code is located in c.g.c.collect. Any app that uses 
collect (which, I suspect, is what most apps use) is going to get most of Guava 
along with it. I did some math on this at one point, but it looks like I never 
posted it externally:

"Basically everyone is using something from collect, and collect pulls in 
base+math+primitives. That's about 9000 methods that everyone would be stuck 
with. Splitting out the remaining 3000 into a separate jar is potentially 
helpful for teams right on the edge [of Android limits], of course."

Original comment by cpov...@google.com on 3 Dec 2013 at 1:21

GoogleCodeExporter commented 9 years ago
Some additional data on the issue.
Follows the weight (disk occupation of uncompressed class files) of guava 15.0  
potential subpackages:

195K    com/google/common/reflect
341K    com/google/common/cache
465K    com/google/common/util
35K     com/google/common/eventbus
33K     com/google/common/escape
287K    com/google/common/base
222K    com/google/common/io
6.1K    com/google/common/annotations
123K    com/google/common/hash
5.9K    com/google/common/xml
145K    com/google/common/primitives
5.3K    com/google/common/html
110K    com/google/common/net
2.7M    com/google/common/collect
47K     com/google/common/math
4.7M    com/google/common/

I'm also attaching the composition graph obtained launching Stan4j on guava 
15.0.
Each edge's weight reflects the dependency's strength, which in turn (on my 
understanding) tells how many times a certain package refers another one 
through imports, method calls and so on.
From that graph it seems the "collect" package depends only on "base".

So if the whole guava uncompressed weight is 4.7M, supposing "collect" weight 
is 2.7M and it depends only on "base" (287K), if we package them as collect.jar 
and base.jar then their cumulative weight would be ~3M.

Original comment by davide.cavestro on 3 Dec 2013 at 4:07

Attachments:

GoogleCodeExporter commented 9 years ago
ERRATA: sorry, on my last post I mixed wrong data and also left out the "math" 
and "primitives" packages, hence a client that depends on "collect" would 
really need 2.7M + 47K + 145K + 287K = ~3.2M (collect + math + primitives + 
base), saving ~1.5M

Original comment by davide.cavestro on 3 Dec 2013 at 4:15

GoogleCodeExporter commented 9 years ago
Well guys. I've now published version 15.0 of my atomic Guava port, in case 
it's useful to anyone.

http://seeds.tribe7.net
http://search.maven.org/#search%7Cga%7C1%7Cseeds

Thank you and happy holidays ;)

Original comment by jjzaz...@gmail.com on 25 Dec 2013 at 4:00

GoogleCodeExporter commented 9 years ago
Well guys. I've now published version 16.0.1 of my atomic Guava port, in case 
it's useful to anyone.

Please, consider moving all String related functionality to a separate package 
'common.base.strings' and also all functional base classes to a separate 
package as well. I think these are the two major fat sources for the base 
classes of Guava.

http://seeds.tribe7.net
http://search.maven.org/#search%7Cga%7C1%7Cseeds

Thank you :)

Original comment by jjzaz...@gmail.com on 23 Mar 2014 at 9:57

GoogleCodeExporter commented 9 years ago
Thanks a lot for the guava split. I want to use Guava in my Android app and 
reached Dalvik's 64k-method bound.

I want to use the Guava Caches and ListenableFuture, but unfortunately the 
latter is part of seeds-util, that references math, primitives, base, function 
and strings.

This results in 13k method signatures versus 14k for the original Guava package.

Is there any way to further reduce it?

Any hint is highly appreciated... :)

Original comment by m...@dr-lanka.de on 18 Jun 2014 at 7:48

GoogleCodeExporter commented 9 years ago
@32 I think it should be possible to shrink it even more. Last time I checked, 
there were at least two code packages which had only one shared class among 
them. I'll see if this is the case when I get to update my port to Guava 17 (I 
know I know, I'll hurry up). :P

Cheers!

Original comment by jjzaz...@gmail.com on 19 Jun 2014 at 3:50

GoogleCodeExporter commented 9 years ago
@33 Is there anything I could help when you port it?

Original comment by m...@dr-lanka.de on 25 Jun 2014 at 9:08

GoogleCodeExporter commented 9 years ago
Please don't use the Guava issue tracker as a support forum for this other 
unsanctioned project.

Original comment by kevinb@google.com on 25 Jun 2014 at 1:47

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:15

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:09