DaveAKing / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Some auxiliary collections are not serializable #978

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Following the discussion in
https://groups.google.com/forum/?fromgroups#!topic/guava-discuss/V2puosAWGkg

It seems that the following collections should be marked as Serializable:

- Collections2.FilteredCollection
- Collections2.TransformedCollection

Other cases such as Lists.Trabnsforming*List are already serializable, so the 
different behavior of the Collections2.* aux classes seems to be a bug.

Original issue reported on code.google.com by opin...@google.com on 25 Apr 2012 at 5:42

GoogleCodeExporter commented 9 years ago
You have my +1 (AND MY AXE), but I can't help but wonder if there was some 
reason it was left out previously.  (To be fair, Collections2.transform is 
rarer than Iterables.transform or Lists.transform, so it might just not have 
been requested.)

Original comment by wasserman.louis on 25 Apr 2012 at 5:47

GoogleCodeExporter commented 9 years ago
I've done some due diligence searching the list, issues and checking source 
code comments, but couldn't find any evidence of a reason for that... nor can I 
think of any technical difficulty. It really seems like an oversight, as you 
say these methods are much less common to use than those for collections of a 
specific type.

For one thing, the argument to Iterables.transform() is even more general than 
Collection; it doesn't have to be a List, Map, or even any kind of Collection, 
it can be just some crazy non-collection object that implements the iterator() 
method. Still the API rewards use with a serializable transformed iterable.

Original comment by opin...@google.com on 26 Apr 2012 at 2:35

GoogleCodeExporter commented 9 years ago
I thought that Kevin had some reservations about making many of our collections 
(especially views) serializable, but I don't remember the details.  I know that 
the team a lot of effort into serialization a while back, and the goal may be 
to avoid that, but I thought that most of the effort was in establishing 
*fixed* serialized forms, which, in the end, we moved away from.

Original comment by cpov...@google.com on 30 Apr 2012 at 5:00

GoogleCodeExporter commented 9 years ago
I'd like to withdraw my +1.  (Also, Iterables.transform is not, in fact, 
serializable; nor is Maps.transformValues, Iterables.filter, or frankly any of 
the functional transformation methods except Lists.transform.)

Original comment by wasserman.louis on 1 May 2012 at 3:21

GoogleCodeExporter commented 9 years ago
@4: Here's a more complete list of methods that return a Serializable 
collection:

- Lists.asList(first, [second,] rest)
- Lists.transform()
- All the unmodifiable*() methods for all collections and all maps

It seems Lists.transform() is the only one that takes a function argument; but 
this  ain't coherent criteria either, things like Lists.reverse() return a 
non-Serializable object even though that is a simple view over the input, 
exactly like the Unmodifiable views.

Also, if the logic was that there's something special in List (or other "real" 
Collection which excludes maps), then filtering a List should also return a 
Serializable. But this doesn't happen because we don't have Lists.filter(), we 
have to use the general filter(Collection...), which IMHO is important missing 
functionality; a proper List->List filter would be very useful (but that's for 
another issue#).

In summary, I can't find any rationale for the decision of whether to return a 
Serializable collection, for all the methods that wrap/transform/filter/derive 
a collection from another. Even if there is some rule, this rule is obscure and 
it's also clearly incompatible with the Java Collections Framework which design 
is very simple: return a Serializable output whenever possible, mandatory when 
the input is Serializable.

Original comment by opin...@google.com on 1 May 2012 at 3:57

GoogleCodeExporter commented 9 years ago
IIRC, I think our default rule is "return a Serializable output when someone 
asks for that feature, and we agree that it's a good idea."

(FYI, Lists.filter was rejected in issue 505.)

Original comment by wasserman.louis on 1 May 2012 at 4:02

GoogleCodeExporter commented 9 years ago
We have not been at all systematic about this, it's true.  By contrast, the JDK 
does seem to be in pretty good shape.  The only obvious discrepancy I see at 
first glance is:

    SerializableTester.reserializeAndAssert(new java.util.TreeSet().tailSet("")); // pass
    SerializableTester.reserializeAndAssert(new java.util.ArrayList().subList(0, 0)); // fail

On one hand, thorough serialization support would take more time than is 
probably justified.  On the other, we could address methods as they are 
requested.  On the third hand, that just makes our inconsistency worse.  But 
we're already inconsistent -- do we care if it's "worse?"

Original comment by cpov...@google.com on 1 May 2012 at 4:11

GoogleCodeExporter commented 9 years ago
@6: In that case, I am asking for this feature, now more formally below. It's 
fine if the answer is No, as long as there are any reasons for that, so far 
everything I heard in this thread was "I wonder if there was some reason", 
"Kevin had some reservations" [that nobody remembers or can find in 
lists/issues], etc.

- Request: All methods (that take a collection/map input and return a derived 
collection output) should return a Serializable collection/map, unless there's 
any reason for this to be impractical or undesirable in specific methods. The 
burden of justification for exception should be on the decision to NOT be 
Serializable.

- Trivial implementation: add "implements Serializable" where missing. All 
these methods return internal, non-public impls, so Guava has control of that. 
Most work would go in unit test and javadoc updates. I would happily volunteer 
to do all work.

- There are no compatibility consequences, because Guava doesn't care about 
serialization compatibility across different JVMs.  There are no costs in code 
size, implementation / maintenance complexity, performance tradeoffs, anything.

- Current behavior is inefficient; it forces me to add code like 
"newArrayList(...)", which has a O(N) cost of allocating/copying all elements, 
just to make some output serializable e.g. to use it as a parameter or return 
value of a remote method.

- Current design is incompatible with the design of the Java SE Collections 
API, a foundation of the Java platform. Result is very counter-intuitive, makes 
Guava incoherent with the framework that it extends, and incoherent even with 
itself.

Original comment by opin...@google.com on 1 May 2012 at 5:05

GoogleCodeExporter commented 9 years ago
@7 (I wrote @8 before reading this) - playing the Devil's advocate, the reason 
for List.subList() not promising a Serializable output is explained in 
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4782922: "This was 
intentional, and is entirely appropriate for "views."  If you serialize a small 
view (say three elements), the entire backing List (say one million elements) 
gets serialized.  If you deserilize the view, the entire million element list 
gets reconstructed at great cost, even though 99.9967% of it is forever 
inacessible."

Now this is a decent criteria to decide this matter. This means that methods 
like reverse(), unmodifiable...(), and transform().., which are injective 
functions (so the output will retain all elements from the input), should 
return a Serializable output.  But non-injective methods like filter() should 
not do that because of the potential inefficiency described by Sun's bug. We 
could argue that even the latter methods could return a Serializable, if the 
method is lazy; but the lazy/eager behavior adds a new and complex factor to 
the problem and I'd agree that the simple and safe (if conservative) solution 
is avoiding Serializable here.

Original comment by opin...@google.com on 1 May 2012 at 5:20

GoogleCodeExporter commented 9 years ago
"- Trivial implementation: add "implements Serializable" where missing. All 
these methods return internal, non-public impls, so Guava has control of that. 
Most work would go in unit test and javadoc updates. I would happily volunteer 
to do all work."

I think there might be issues with GWT serialization that make this a much less 
trivial thing than you think.

"- Current behavior is inefficient; it forces me to add code like 
"newArrayList(...)", which has a O(N) cost of allocating/copying all elements, 
just to make some output serializable e.g. to use it as a parameter or return 
value of a remote method."
I think that was actually the point -- to make those costs up-front.  Most 
users of transform() and the like don't use Serializable functions, and 
serializing transformed collections has a lot of room for unexpected and weird 
behavior.  Forcing users to make copies of those collections makes the costs 
and semantics obvious and up-front.  I think there's a decent argument to be 
made for _removing_ the serializability of Lists.transform.

Original comment by wasserman.louis on 1 May 2012 at 5:28

GoogleCodeExporter commented 9 years ago
@10 I think Guava should make a reasonable effort to serve a more general 
audience. GWT supports Java SE collections too, with caveats like declaring the 
element type. So we'll also need helpers like readObject()/writeObject(); no 
big deal, though not in the single-liner league anymore. The public Guava 
collection/map classes already do that, as most custom collections should.

On efficiency: disagree; having those output collections non-serializable 
doesn't simply "make costs up-front" - it creates costs that wouldn't exist 
otherwise. A serializable output would just be consumed by the serializer 
without any temporary allocation or copying.  Unless there's some problem 
specific to GWT that I wouldn't know. Considering regular serialization, the 
methods I'm now asking to review (see my comment @9) are no different than 
similar cases from JavaSE. No new "unexpected and weird" stuff in 
Maps.unmodifiableBimap() that already doesn't exist in 
Collections.unmodifiableMap(). Those don't even need read/writeObject(), just 
do a shallow serialization like any other Java object.

Original comment by opin...@google.com on 1 May 2012 at 6:33

GoogleCodeExporter commented 9 years ago
Maps.unmodifiableBimap is already serializable.

Let me clarify: I'm concerned that there are reasons specific to transform() 
methods for serializing their results to be a bad idea.  Example: you'll 
frequently see users saying Iterables.transform(peopleSet, getNameFunction) -- 
they're mapping objects into a single field of those objects.  Something 
"smaller," not bigger.

Users who could serialize such collections would be serializing much more than 
they expected -- the whole objects, not the result objects -- and forcing an 
explicit copy makes the difference clear to those users.

Additionally, serializing a transformed collection can surprise users when the 
elements of the original collection are *not* serializable, and the elements of 
the result list *are* -- but since it's a view of the original collection, the 
collection (of serializable elements) is not serializable.  We know for a fact 
users have gotten confused by this (in the case of Lists.transform) in the past 
-- see 
http://stackoverflow.com/questions/9418032/java-serialization-problems-while-usi
ng-guava-lists-transform.

Original comment by wasserman.louis on 1 May 2012 at 7:35

GoogleCodeExporter commented 9 years ago
@12: The transform problem is a good argument. It's not perfectly equivalent to 
Sun's scenario (you can even, easily, have transforms that produce a bigger 
output than the input)... but the JDK collections already decides the tradeoff 
in the conservative side / least-surprise rule, so OK for me.

Well, the last standing part of my RFE is that the API should be consistent and 
intuitive. So I'd vote to remove the Serializable from Lists.transform().

Original comment by opin...@google.com on 1 May 2012 at 8:18

GoogleCodeExporter commented 9 years ago
Chris said "We have not been at all systematic about this, it's true."

In the olden days, we were actually trying pretty hard to be systematic about 
it. We just realized it had been THE thing delaying the release of Google 
Collections 1.0 for quite a while, and we gave up on really perfecting the 
serialization support.

Here's the spreadsheet I was using at the time:

https://docs.google.com/spreadsheet/ccc?key=0AoFn3TZKLWTUcGQ4ZEFReUhiZGV4UTQ2Yk5
0NzhRWWc

I'm saddened to learn that we accidentally made Lists.transform serializable. :)

Original comment by kevinb@google.com on 1 May 2012 at 9:32

GoogleCodeExporter commented 9 years ago
Oh man.  I had no idea we were so thorough back then.

I have to ask -- is there anything we could do to remove serializability for 
Lists.transform, while controlling compatibility issues?

Original comment by wasserman.louis on 1 May 2012 at 10:15

GoogleCodeExporter commented 9 years ago
It's not worth it. We should just let it be a wart.

I think we can close this issue now?

(p.s. very amused by the "tag" I accidentally added in my last update.)

Original comment by kevinb@google.com on 1 May 2012 at 10:55

GoogleCodeExporter commented 9 years ago
OK for me to close. Unfortunately Lists.transform() documents that it returns a 
Serializable list; but maybe this doc could be updated to discourage reliance 
on that, explaining the risk of transferring unwanted data, and warning that a 
future update may not be serializable. This would be a more "managed wart" even 
if the change is never made due to backwards compatibility.

Original comment by opin...@google.com on 2 May 2012 at 12:10

GoogleCodeExporter commented 9 years ago
Marking as WontFix.

I vote we leave serialization in, but add some lines to the documentation 
telling people that it's a bad idea to use it.  Lists.transform isn't @Beta -- 
that means we're committed to keeping functionality for two years before 
deleting it.

It might be worth deleting it in the longer term, but certainly not right now.

Original comment by wasserman.louis on 2 May 2012 at 12:13

GoogleCodeExporter commented 9 years ago
I'll whip this doc change up right now.

Original comment by kevinb@google.com on 2 May 2012 at 12:15

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:14

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08