addthis / stream-lib

Stream summarizer and cardinality estimator.
Apache License 2.0
2.26k stars 556 forks source link

remove fastutil, add guava, add test-slf4j-simple #79

Closed tea-dragon closed 10 years ago

tea-dragon commented 10 years ago

the fastutil addition should never have been merged without addressing the enormous and almost entirely untouched mountain of code that it pulls in. there are several possible solutions, but I assert that the burden is on someone who cares more about QDigest to implement one.

conversely, guava is tiny by comparison, has a variety of useful classes, and is widely enough used that most downstream users will see no increase in dependency count anyway.

tdunning commented 10 years ago

But which version of guava?

I had to pull guava (tears) from t-digest because I couldn't figure out an acceptable version.

On Wed, Aug 20, 2014 at 7:26 AM, Ian Barfield notifications@github.com wrote:

the fastutil addition should have never been merged without addressing the enormous and almost entirely untouched mountain of code that it pulls in. there are several possible solutions, but I assert that the burden is on someone who cares more about QDigest to implement one.

conversely, guava is tiny by comparison, has a variety of useful classes, and is widely enough used that most downstream users will

see no increase in dependency count anyway.

You can merge this Pull Request by running

git pull https://github.com/addthis/stream-lib kill-fastutil

Or view, comment on, or merge it at:

https://github.com/addthis/stream-lib/pull/79 Commit Summary

  • remove fastutil, add guava, add test-slf4j-simple

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79.

tdunning commented 10 years ago

Importing version 16 of Java makes it hard to use streamlib with Hadoop (which still is stuck on version 11).

On Wed, Aug 20, 2014 at 8:19 AM, Ted Dunning ted.dunning@gmail.com wrote:

But which version of guava?

I had to pull guava (tears) from t-digest because I couldn't figure out an acceptable version.

On Wed, Aug 20, 2014 at 7:26 AM, Ian Barfield notifications@github.com wrote:

the fastutil addition should have never been merged without addressing the enormous and almost entirely untouched mountain of code that it pulls in. there are several possible solutions, but I assert that the burden is on someone who cares more about QDigest to implement one.

conversely, guava is tiny by comparison, has a variety of useful classes, and is widely enough used that most downstream users will

see no increase in dependency count anyway.

You can merge this Pull Request by running

git pull https://github.com/addthis/stream-lib kill-fastutil

Or view, comment on, or merge it at:

https://github.com/addthis/stream-lib/pull/79 Commit Summary

  • remove fastutil, add guava, add test-slf4j-simple

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79.

tea-dragon commented 10 years ago

not sure if that is hilarious or sad. In any event, they can feel free to continue to use 2.7.0 until they upgrade guava. Apparently using ancient versions of things is their jam so i'm sure they won't mind. There is a good chance that forcing versions via dependency management (in one direction or the other) will 'just work', but there is no chance of me investigating that myself.

On Wed, Aug 20, 2014 at 11:21 AM, Ted Dunning notifications@github.com wrote:

Importing version 16 of Java makes it hard to use streamlib with Hadoop (which still is stuck on version 11).

On Wed, Aug 20, 2014 at 8:19 AM, Ted Dunning ted.dunning@gmail.com wrote:

But which version of guava?

I had to pull guava (tears) from t-digest because I couldn't figure out an acceptable version.

On Wed, Aug 20, 2014 at 7:26 AM, Ian Barfield notifications@github.com

wrote:

the fastutil addition should have never been merged without addressing the enormous and almost entirely untouched mountain of code that it pulls in. there are several possible solutions, but I assert that the burden is on someone who cares more about QDigest to implement one.

conversely, guava is tiny by comparison, has a variety of useful classes, and is widely enough used that most downstream users will

see no increase in dependency count anyway.

You can merge this Pull Request by running

git pull https://github.com/addthis/stream-lib kill-fastutil

Or view, comment on, or merge it at:

https://github.com/addthis/stream-lib/pull/79 Commit Summary

  • remove fastutil, add guava, add test-slf4j-simple

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79.

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79#issuecomment-52793754.

cburroughs commented 10 years ago

I think having a minimal set of depdencies has been good for stream-lib adoption, and tt's gratifying that it's made it's way into the base of a number of depdency sets. If we could kill fastutil without adding guava I think that would be a net improvement. Vendoring a handful of files (like I think we did for one of the digests) is an option.

tea-dragon commented 10 years ago

we could kill fastutil without adding guava, but you would have to be insane not to still consider it a net improvement to do both. I have seen more than a few people or pom file comments complain about the fastutil dependency. I don't believe anyone really cares about guava.

On Wed, Aug 20, 2014 at 11:46 AM, cburroughs notifications@github.com wrote:

I think having a minimal set of depdencies has been good for stream-lib adoption, and tt's gratifying that it's made it's way into the base of a number of depdency sets. If we could kill fastutil without adding guava I think that would be a net improvement. Vendoring a handful of files (like I think we did for one of the digests) is an option.

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79#issuecomment-52797581.

cburroughs commented 10 years ago

In addition to the comments in this ticket we asked about guava in the mailing list before https://groups.google.com/forum/#!searchin/stream-lib-user/guava/stream-lib-user/HHqaRtYCywU/FRsgNUNGei8J

cykl commented 10 years ago

I don't believe anyone really cares about guava

The issue with fastutil is its weight. It is seldom used, so it is an extra cost for almost every user of stream-lib.

The issue with guava is version management and classpath hell. Guava is so much used that you have to be very careful about the version and API you use: nor too recent, nor deprecated, nor unstable. Even if the Guava team is very conservative, I have see a lot of version conflicts with Guava... Some of them cannot be solved.

I am quite sure you don't want to prevent stream-lib usage from Hadoop or other large projects so you should run integration test with several versions of Guava...

tea-dragon commented 10 years ago

I have no plans to try to delete stream-lib 2.7.0 from maven central. If a large project is already using a wide array of hideously out of date dependencies, then they are one hundred percent free to add stream-lib to that list if it.

"version management and classpath hell" sounds like a hadoop problem more than anything else.

stream-lib is in need of serious improvements in a wide variety of areas. Anything that discourages or slows development should be enemy number one. Otherwise we'll probably end up with just as many options available to hadoop and friends as we have now and would anyway -- 2.7.0 and a small bug fix once every three months.

On Wed, Aug 20, 2014 at 12:09 PM, Clément MATHIEU notifications@github.com wrote:

I don't believe anyone really cares about guava

The issue with fastutil is its weight. It is seldom used, so it is an extra cost for almost every user of stream-lib.

The issue with guava is version management and classpath hell. Guava is so much used that you have to be very careful about the version and API you use: nor too recent, nor deprecated, nor unstable. Even if the Guava team is very conservative, I have see a lot of version conflicts with Guava... Some of them cannot be solved.

I am quite sure you don't want to prevent stream-lib usage from Hadoop or other large projects so you should run integration test with several versions of Guava...

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79#issuecomment-52800892.

abramsm commented 10 years ago

We've received great contributions to stream-lib from people working on projects that depend on the open source echo system. I hope for future contributions and I want those people and projects to be able to benefit from future improvements to stream-lib. My suggestion is that we investigate incorporating the minimum set of guava files into this project (as Chris suggested) or consider providing multiple build artifacts of future stream-lib releases that use a variety of tested guava versions. I'd like it to be as easy as possible for people to use stream-lib in their own environments.

Matt

On Wed, Aug 20, 2014 at 12:24 PM, Ian Barfield notifications@github.com wrote:

I have no plans to try to delete stream-lib 2.7.0 from maven central. If a large project is already using a wide array of hideously out of date dependencies, then they are one hundred percent free to add stream-lib to that list if it.

"version management and classpath hell" sounds like a hadoop problem more than anything else.

stream-lib is in need of serious improvements in a wide variety of areas. Anything that discourages or slows development should be enemy number one. Otherwise we'll probably end up with just as many options available to hadoop and friends as we have now and would anyway -- 2.7.0 and a small bug fix once every three months.

On Wed, Aug 20, 2014 at 12:09 PM, Clément MATHIEU < notifications@github.com> wrote:

I don't believe anyone really cares about guava

The issue with fastutil is its weight. It is seldom used, so it is an extra cost for almost every user of stream-lib.

The issue with guava is version management and classpath hell. Guava is so much used that you have to be very careful about the version and API you use: nor too recent, nor deprecated, nor unstable. Even if the Guava team is very conservative, I have see a lot of version conflicts with Guava... Some of them cannot be solved.

I am quite sure you don't want to prevent stream-lib usage from Hadoop or other large projects so you should run integration test with several versions of Guava...

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79#issuecomment-52800892.

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79#issuecomment-52803136.

cykl commented 10 years ago

Classpath hell is not hadoop specific, it is an issue for any large project/ecosystem and for any widely used library.

Fell free to add guava, just be aware that you could end up we lot of complaint from Hadoop users at least. Last time I checked streaming algorithms were about bit manipulation and "main issues" of stream-lib were its design, readability and not so great performance. Guava might help or could motivate people to write a clean implementation of HLL+ from scratch in case of troubles.

AFAIK Spark depends on stream-lib now. Make sure you are compatible with their version of guava.

tea-dragon commented 10 years ago

In all honesty, all I care about is the existence of incrementally more optimal algorithms and implementations thereof. To that end, someone being motivated to write new implementations of anything sounds great to me. I haven't done any real comparisons, but I have no qualms about linking to the existence of this project, for example.

That said, I have clearly overlooked the feelings of some of the stream-lib users. I am content to contribute better programs in less contested ways. I have pushed one last commit that revokes guava's non-test status and addresses a mistakenly leaked test dependency. I will note that the project I linked to earlier also uses fastutil so it is possible that it is worth considering keeping it with plans to expand usage in the future. My personal opinion is that fastutil's implementations are disappointing and that the classes should at least be filtered, but do as you will.

mspiegel commented 10 years ago

It looks like the scope of the guava dependency in the pull request is "test". I don't understand why there is so much resistance to adding guava. Should we keep the scope as "test" and add optional as true? Will that address the issues?

tdunning commented 10 years ago

A test dependency doesn't have nearly the problems of a non-test dependency. Guava just got moved to test scope.

On Wed, Aug 20, 2014 at 10:31 AM, mspiegel notifications@github.com wrote:

It looks like the scope of the guava dependency in the pull request is "test". I don't understand why there is so much resistance to adding guava. Should we keep the scope as "test" and add optional as true? Will that address the issues?

— Reply to this email directly or view it on GitHub https://github.com/addthis/stream-lib/pull/79#issuecomment-52812318.