dnrajugade / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Incompatible change to Byte/LineProcessor #1187

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
We're considering an incompatible change to ByteProcessor and LineProcessor and 
the methods that use them. They currently have generics and a getResult() 
method, but in practice, most usages have a field that is accumulated to in the 
process method and returned by the getResult() method. Instead, a final 
variable in the calling method could be accumulated to in the process method, 
with no need for getResult() at all.

Is anyone using ByteProcessor or LineProcessor in a way that would make this 
change a problem for them?

Original issue reported on code.google.com by cgdecker@google.com on 2 Nov 2012 at 4:43

GoogleCodeExporter commented 9 years ago
This sounds a lot like the difference between the following in Spring's JDBC 
result set handling, which covers most of the possibilities. Worth seeing that 
for prior art of line-by-line processing?

* Guava Byte/LineProcessor - boolean processLine(String), T getResult() - you 
get a callback for each row/segment of data and are expected to have a returned 
accumulator
* RowCallbackHandler - void processRow(ResultSet) - you get a callback for each 
row and do whatever you want with accumulating, you implement getResult() if 
you want it. Someone else controls looping. [1]
* RowMapper - T mapRow(ResultSet, int rowNum) - you get a callback for each row 
and return an object for that row. Effectively getResult() returns a List<T> of 
the mapped objects. [2]
* ResultSetExtractor - T extractData(ResultSet) - generalised version of the 
others. Your implementation does everything: controls looping, gathers and 
returns results. [3]

The other very general option is for the row callback to be provided with a 
clientObject/accumulator, which maybe was created in the processor as well. 
(This is very foldLeft.) You start needing a few generic types for the type of 
the accumulator, maybe something else for the ultimately returned type (think 
ImmutableList.Builder accumulator -> ImmutableList, for example) and a few 
methods like createAccumulator, postProcess(Accumulator) (to build the 
ImmutableList, for example). This provides a lot of flexibility but is likely 
much more of a change than is warranted.

Are you talking about just removing getResult()? For my part I'd be happy to 
take the change, but in such a small interface it seems not likely to be that 
big a deal to fix.

[1] - 
http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework
/jdbc/core/RowCallbackHandler.html
[2] - 
http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework
/jdbc/core/RowMapper.html
[3] - 
http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework
/jdbc/core/ResultSetExtractor.html

Original comment by joe.j.kearney on 2 Nov 2012 at 7:02

GoogleCodeExporter commented 9 years ago
> Are you talking about just removing getResult()? For my part I'd be happy to 
take the change, but in such a small interface it seems not likely to be that 
big a deal to fix.

Yep, basically just changing the interfaces to:

interface ByteProcessor {
  boolean processBytes(byte[] buf, int off, int len) throws IOException;
}

interface LineProcessor {
  boolean processLine(String line) throws IOException;
}

Anything else can be built on top of those pretty easily, so I don't think I'd 
want to go down the road of providing narrower functionality like RowMapper 
unless it was clear that mapping individual lines to objects was the most 
common use case.

Original comment by cgdecker@google.com on 2 Nov 2012 at 7:13

GoogleCodeExporter commented 9 years ago
Proposed change is quite clean when accumulator T is of type Iterable<E> but we 
use LineProcessor with Files.readLines(File file, Charset charset, 
LineProcessor<T> callback) method which will be harder to implement for you (if 
possible) and for use if you change the interface. 

I mean, in our case, getResult method builds some more complex object based on 
few private instance variables some of which change state and, since they're 
implementation detail of concrete LineProcessor subclass, they should not leak 
outside as suggested in this issue. This code:

    CacheBuilder.newBuilder().build(new CacheLoader<String, AnnouncementsData>() {
      @Override
      public AnnouncementsData load(final String indexVersion) throws Exception {
        return Files.readLines(new File(getAnnouncementsPath(indexVersion)),
            Charsets.UTF_8, new AnnouncementsLineProcessor());
      }

plus AnnouncementsLineProcessor class definition with 
ImmutableList.Builder<String>, String and SomeObj private variables (plus 
boolean flag which is used for some optimization and is enteirly an 
implementation detail) would turn into this:

    CacheBuilder.newBuilder().build(new CacheLoader<String, AnnouncementsData>() {
      @Override
      public AnnouncementsData load(final String indexVersion) throws Exception {
        // these were AnnouncementsLineProcessor instance variables
        final ImmutableList.Builder<String> a = ImmutableList.builder();
        final StringBuilder b = new StringBuilder();
        // assignment won't work - c is final, so I have to use some wrapper here
        final SomeObjWrapper c;
        // this flag leaks out of LineProcessor impl and can't be simple boolean since is final
        final AtomicBoolean checked = AtomicBoolean(false); 
        Files.readLines(new File(getAnnouncementsPath(indexVersion)),
            Charsets.UTF_8, new LineProcessor() {
              @Override
              public boolean processLine(final String input) throws IOException {
                /* must be inline */
              } });
        return new AnnouncementsData(a.build(), b.toString(), c.get());
      }

I am not saying it's not possible for me to change my LineProcessor class, but 
having getResult method has pros - like hiding implementation concrete 
LineProcessor details and Filles.readLines method - in exchange for one more 
method in API. Correct me if I'm wrong but I guess it's a discussion of 
advantages and disadvantages of few different line-by-line processing 
philosophies just as joe.j.kearney wrote.

Original comment by xaerx...@gmail.com on 12 Nov 2012 at 12:51

GoogleCodeExporter commented 9 years ago
@xaerxess: Keep in mind that you could keep your AnnouncementsLineProcessor 
pretty much as it is now, just with getResult() not being part of the 
interface. You'd just need to declare it as a variable before calling 
readLines, then call getResult() on it yourself afterwards.

Original comment by cgdec...@gmail.com on 12 Nov 2012 at 1:57

GoogleCodeExporter commented 9 years ago
Yeah, I guess I've overlooked this solution... Then it would be cleaner API but 
slightly more dirty usage:

    CacheBuilder.newBuilder().build(new CacheLoader<String, AnnouncementsData>() {
      @Override
      public AnnouncementsData load(final String indexVersion) throws Exception {
        final AnnouncementsLineProcessor processor = new AnnouncementsLineProcessor(); // cannot use LineProcessor here
        Files.readLines(new File(getAnnouncementsPath(indexVersion)), Charsets.UTF_8, processor);
        return processor.getResult();
      }

but seems reasonable for me now.

Probably it's not the best place to ask but would it be a good idea (just for 
my code) to create marker interface extending LineProcessor with getResult 
method (or also extending Supplier<T> maybe?) if you changed the API and I 
wanted to program to an interface instead of class?

Original comment by xaerx...@gmail.com on 12 Nov 2012 at 7:18

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:13

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:18

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08