Closed GoogleCodeExporter closed 9 years ago
You have my +1 (AND MY AXE), but I can't help but wonder if there was some
reason it was left out previously. (To be fair, Collections2.transform is
rarer than Iterables.transform or Lists.transform, so it might just not have
been requested.)
Original comment by wasserman.louis
on 25 Apr 2012 at 5:47
I've done some due diligence searching the list, issues and checking source
code comments, but couldn't find any evidence of a reason for that... nor can I
think of any technical difficulty. It really seems like an oversight, as you
say these methods are much less common to use than those for collections of a
specific type.
For one thing, the argument to Iterables.transform() is even more general than
Collection; it doesn't have to be a List, Map, or even any kind of Collection,
it can be just some crazy non-collection object that implements the iterator()
method. Still the API rewards use with a serializable transformed iterable.
Original comment by opin...@google.com
on 26 Apr 2012 at 2:35
I thought that Kevin had some reservations about making many of our collections
(especially views) serializable, but I don't remember the details. I know that
the team a lot of effort into serialization a while back, and the goal may be
to avoid that, but I thought that most of the effort was in establishing
*fixed* serialized forms, which, in the end, we moved away from.
Original comment by cpov...@google.com
on 30 Apr 2012 at 5:00
I'd like to withdraw my +1. (Also, Iterables.transform is not, in fact,
serializable; nor is Maps.transformValues, Iterables.filter, or frankly any of
the functional transformation methods except Lists.transform.)
Original comment by wasserman.louis
on 1 May 2012 at 3:21
@4: Here's a more complete list of methods that return a Serializable
collection:
- Lists.asList(first, [second,] rest)
- Lists.transform()
- All the unmodifiable*() methods for all collections and all maps
It seems Lists.transform() is the only one that takes a function argument; but
this ain't coherent criteria either, things like Lists.reverse() return a
non-Serializable object even though that is a simple view over the input,
exactly like the Unmodifiable views.
Also, if the logic was that there's something special in List (or other "real"
Collection which excludes maps), then filtering a List should also return a
Serializable. But this doesn't happen because we don't have Lists.filter(), we
have to use the general filter(Collection...), which IMHO is important missing
functionality; a proper List->List filter would be very useful (but that's for
another issue#).
In summary, I can't find any rationale for the decision of whether to return a
Serializable collection, for all the methods that wrap/transform/filter/derive
a collection from another. Even if there is some rule, this rule is obscure and
it's also clearly incompatible with the Java Collections Framework which design
is very simple: return a Serializable output whenever possible, mandatory when
the input is Serializable.
Original comment by opin...@google.com
on 1 May 2012 at 3:57
IIRC, I think our default rule is "return a Serializable output when someone
asks for that feature, and we agree that it's a good idea."
(FYI, Lists.filter was rejected in issue 505.)
Original comment by wasserman.louis
on 1 May 2012 at 4:02
We have not been at all systematic about this, it's true. By contrast, the JDK
does seem to be in pretty good shape. The only obvious discrepancy I see at
first glance is:
SerializableTester.reserializeAndAssert(new java.util.TreeSet().tailSet("")); // pass
SerializableTester.reserializeAndAssert(new java.util.ArrayList().subList(0, 0)); // fail
On one hand, thorough serialization support would take more time than is
probably justified. On the other, we could address methods as they are
requested. On the third hand, that just makes our inconsistency worse. But
we're already inconsistent -- do we care if it's "worse?"
Original comment by cpov...@google.com
on 1 May 2012 at 4:11
@6: In that case, I am asking for this feature, now more formally below. It's
fine if the answer is No, as long as there are any reasons for that, so far
everything I heard in this thread was "I wonder if there was some reason",
"Kevin had some reservations" [that nobody remembers or can find in
lists/issues], etc.
- Request: All methods (that take a collection/map input and return a derived
collection output) should return a Serializable collection/map, unless there's
any reason for this to be impractical or undesirable in specific methods. The
burden of justification for exception should be on the decision to NOT be
Serializable.
- Trivial implementation: add "implements Serializable" where missing. All
these methods return internal, non-public impls, so Guava has control of that.
Most work would go in unit test and javadoc updates. I would happily volunteer
to do all work.
- There are no compatibility consequences, because Guava doesn't care about
serialization compatibility across different JVMs. There are no costs in code
size, implementation / maintenance complexity, performance tradeoffs, anything.
- Current behavior is inefficient; it forces me to add code like
"newArrayList(...)", which has a O(N) cost of allocating/copying all elements,
just to make some output serializable e.g. to use it as a parameter or return
value of a remote method.
- Current design is incompatible with the design of the Java SE Collections
API, a foundation of the Java platform. Result is very counter-intuitive, makes
Guava incoherent with the framework that it extends, and incoherent even with
itself.
Original comment by opin...@google.com
on 1 May 2012 at 5:05
@7 (I wrote @8 before reading this) - playing the Devil's advocate, the reason
for List.subList() not promising a Serializable output is explained in
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4782922: "This was
intentional, and is entirely appropriate for "views." If you serialize a small
view (say three elements), the entire backing List (say one million elements)
gets serialized. If you deserilize the view, the entire million element list
gets reconstructed at great cost, even though 99.9967% of it is forever
inacessible."
Now this is a decent criteria to decide this matter. This means that methods
like reverse(), unmodifiable...(), and transform().., which are injective
functions (so the output will retain all elements from the input), should
return a Serializable output. But non-injective methods like filter() should
not do that because of the potential inefficiency described by Sun's bug. We
could argue that even the latter methods could return a Serializable, if the
method is lazy; but the lazy/eager behavior adds a new and complex factor to
the problem and I'd agree that the simple and safe (if conservative) solution
is avoiding Serializable here.
Original comment by opin...@google.com
on 1 May 2012 at 5:20
"- Trivial implementation: add "implements Serializable" where missing. All
these methods return internal, non-public impls, so Guava has control of that.
Most work would go in unit test and javadoc updates. I would happily volunteer
to do all work."
I think there might be issues with GWT serialization that make this a much less
trivial thing than you think.
"- Current behavior is inefficient; it forces me to add code like
"newArrayList(...)", which has a O(N) cost of allocating/copying all elements,
just to make some output serializable e.g. to use it as a parameter or return
value of a remote method."
I think that was actually the point -- to make those costs up-front. Most
users of transform() and the like don't use Serializable functions, and
serializing transformed collections has a lot of room for unexpected and weird
behavior. Forcing users to make copies of those collections makes the costs
and semantics obvious and up-front. I think there's a decent argument to be
made for _removing_ the serializability of Lists.transform.
Original comment by wasserman.louis
on 1 May 2012 at 5:28
@10 I think Guava should make a reasonable effort to serve a more general
audience. GWT supports Java SE collections too, with caveats like declaring the
element type. So we'll also need helpers like readObject()/writeObject(); no
big deal, though not in the single-liner league anymore. The public Guava
collection/map classes already do that, as most custom collections should.
On efficiency: disagree; having those output collections non-serializable
doesn't simply "make costs up-front" - it creates costs that wouldn't exist
otherwise. A serializable output would just be consumed by the serializer
without any temporary allocation or copying. Unless there's some problem
specific to GWT that I wouldn't know. Considering regular serialization, the
methods I'm now asking to review (see my comment @9) are no different than
similar cases from JavaSE. No new "unexpected and weird" stuff in
Maps.unmodifiableBimap() that already doesn't exist in
Collections.unmodifiableMap(). Those don't even need read/writeObject(), just
do a shallow serialization like any other Java object.
Original comment by opin...@google.com
on 1 May 2012 at 6:33
Maps.unmodifiableBimap is already serializable.
Let me clarify: I'm concerned that there are reasons specific to transform()
methods for serializing their results to be a bad idea. Example: you'll
frequently see users saying Iterables.transform(peopleSet, getNameFunction) --
they're mapping objects into a single field of those objects. Something
"smaller," not bigger.
Users who could serialize such collections would be serializing much more than
they expected -- the whole objects, not the result objects -- and forcing an
explicit copy makes the difference clear to those users.
Additionally, serializing a transformed collection can surprise users when the
elements of the original collection are *not* serializable, and the elements of
the result list *are* -- but since it's a view of the original collection, the
collection (of serializable elements) is not serializable. We know for a fact
users have gotten confused by this (in the case of Lists.transform) in the past
-- see
http://stackoverflow.com/questions/9418032/java-serialization-problems-while-usi
ng-guava-lists-transform.
Original comment by wasserman.louis
on 1 May 2012 at 7:35
@12: The transform problem is a good argument. It's not perfectly equivalent to
Sun's scenario (you can even, easily, have transforms that produce a bigger
output than the input)... but the JDK collections already decides the tradeoff
in the conservative side / least-surprise rule, so OK for me.
Well, the last standing part of my RFE is that the API should be consistent and
intuitive. So I'd vote to remove the Serializable from Lists.transform().
Original comment by opin...@google.com
on 1 May 2012 at 8:18
Chris said "We have not been at all systematic about this, it's true."
In the olden days, we were actually trying pretty hard to be systematic about
it. We just realized it had been THE thing delaying the release of Google
Collections 1.0 for quite a while, and we gave up on really perfecting the
serialization support.
Here's the spreadsheet I was using at the time:
https://docs.google.com/spreadsheet/ccc?key=0AoFn3TZKLWTUcGQ4ZEFReUhiZGV4UTQ2Yk5
0NzhRWWc
I'm saddened to learn that we accidentally made Lists.transform serializable. :)
Original comment by kevinb@google.com
on 1 May 2012 at 9:32
Oh man. I had no idea we were so thorough back then.
I have to ask -- is there anything we could do to remove serializability for
Lists.transform, while controlling compatibility issues?
Original comment by wasserman.louis
on 1 May 2012 at 10:15
It's not worth it. We should just let it be a wart.
I think we can close this issue now?
(p.s. very amused by the "tag" I accidentally added in my last update.)
Original comment by kevinb@google.com
on 1 May 2012 at 10:55
OK for me to close. Unfortunately Lists.transform() documents that it returns a
Serializable list; but maybe this doc could be updated to discourage reliance
on that, explaining the risk of transferring unwanted data, and warning that a
future update may not be serializable. This would be a more "managed wart" even
if the change is never made due to backwards compatibility.
Original comment by opin...@google.com
on 2 May 2012 at 12:10
Marking as WontFix.
I vote we leave serialization in, but add some lines to the documentation
telling people that it's a bad idea to use it. Lists.transform isn't @Beta --
that means we're committed to keeping functionality for two years before
deleting it.
It might be worth deleting it in the longer term, but certainly not right now.
Original comment by wasserman.louis
on 2 May 2012 at 12:13
I'll whip this doc change up right now.
Original comment by kevinb@google.com
on 2 May 2012 at 12:15
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:14
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:08
Original issue reported on code.google.com by
opin...@google.com
on 25 Apr 2012 at 5:42