google-code-export / protostuff

Automatically exported from code.google.com/p/protostuff
Apache License 2.0
1 stars 1 forks source link

ProtostuffIOUtil.parseListFrom() exhausts the InputStream #132

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
In our application, we use protostuff to write/read multiple messages to/from 
an OutputStream/InputStream.

Since need to serialize some lists, I wanted to use 
ProtostuffIOUtil.writeListTo() and ProtostuffIOUtil.parseListFrom(). But I get 
an EOFException when calling ProtostuffIOUtil.mergeDelimitedFrom() after 
ProtostuffIOUtil.parseListFrom(). It seems like mergeDelimitedFrom() exhausts 
the InputStream.

I attached an unit test that shows the problem.

Here is the exception I get when running the unit test:

java.io.EOFException: mergeDelimitedFrom
    at com.dyuproject.protostuff.IOUtil.mergeDelimitedFrom(IOUtil.java:91)
    at com.dyuproject.protostuff.ProtostuffIOUtil.mergeDelimitedFrom(ProtostuffIOUtil.java:151)
    at com.dyuproject.protostuff.WriteListToWithWriteDelimitedToTest.testWriteListToThenWriteDelimitedTo(WriteListToWithWriteDelimitedToTest.java:62)

Original issue reported on code.google.com by nev...@gmail.com on 28 Aug 2012 at 9:58

Attachments:

GoogleCodeExporter commented 9 years ago
I want to update the second sentence (fix a typo, add details), but I can't 
seem to find a way to edit the issue. It should be:

"Since we need to serialize some lists, I wanted to use 
ProtostuffIOUtil.writeListTo() and ProtostuffIOUtil.parseListFrom(). But I get 
an EOFException when calling ProtostuffIOUtil.mergeDelimitedFrom() after 
ProtostuffIOUtil.parseListFrom(). It seems that parseListFrom() exhausts the 
InputStream (the "pos" field of the ByteArrayInputStream is set to its "count" 
field, which means there is nothing more to read)."

Original comment by nev...@gmail.com on 28 Aug 2012 at 12:00

GoogleCodeExporter commented 9 years ago
Yes that is the expected behavior.  
Both writeListTo and parseListFrom does not use prefix delimiters but instead 
uses suffix delimiters for streaming operations.

The purpose of writing the size of the list (number of elements) is to help the 
deserializer when constructing the array list with the exact size (an 
optimization to avoid dynamic growth).

The best approach is to re-use the ProtostuffOutput and CodedInput, and use a 
tail-delimiter to write your messages.

Here's an example:

        // loop through this with any type of messages you want to write
        int fooType = 1;
        output.tail = output.sink.writeInt32(fooType, output, output.tail);
        schema.writeTo(output, foo);
        output.tail = output.sink.writeByte((byte)WireFormat.WIRETYPE_TAIL_DELIMITER, output, 
                buffer);

        // marker for end of stream
        output.tail = output.sink.writeInt32(0, output, output.tail);

        // loop through this
        CodedInput input = null;
        int type = input.readUInt32();
        switch(type)
        {
            case 0:
                // all messages are read
                break;
            case 1:
                schema.mergeFrom(input, foo);
                // do something with foo
                break;
        }

Original comment by david.yu...@gmail.com on 29 Aug 2012 at 8:10

GoogleCodeExporter commented 9 years ago
Mmmh. I'm not sure I understand how I can re-use the ProtostuffOuput and 
CodedInput. It seems like ProtostuffIOUtil.writeListTo() and 
ProtostuffIOUtil.parseListFrom() create those as local variables, but I don't 
have access to them.

I guess what I really need is some kind of writeDelimitedListTo() and 
parseDelimitedListFrom(). writeDelimitedListTo() would write the size of the 
list, the size of the messages, and the messages. parseDelimitedListFrom() 
would create an ArrayList (using the size of the list), then read all the 
messages (limiting the buffer to the size of the messages).

writeListTo() returns the size of the messages. Could I write that to my 
OutputStream? The problem being that the size of the messages would be written 
*after* the messages themselves :/

Original comment by nev...@gmail.com on 29 Aug 2012 at 9:29

GoogleCodeExporter commented 9 years ago
I've been using these utility methods for now: https://gist.github.com/3509121

When I change my unit test to use these utility methods, the unit test passes:

    public void testWriteListToThenWriteDelimitedTo() throws Exception {
        ByteArrayOutputStream out = new ByteArrayOutputStream();

//        ProtostuffIOUtil.writeListTo(out, bars, Bar.getSchema(), buf());
        CollectionSerializationUtils.writeListTo(out, bars, Bar.getSchema());
        ProtostuffIOUtil.writeDelimitedTo(out, baz, Baz.getSchema(), buf());

        ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());

//        List<Bar> parsedBars = ProtostuffIOUtil.parseListFrom(in, 
Bar.getSchema());
        List<Bar> parsedBars = CollectionSerializationUtils.parseListFrom(in, Bar.getSchema());
        Baz parsedBaz = Baz.getSchema().newMessage();
        ProtostuffIOUtil.mergeDelimitedFrom(in, parsedBaz, Baz.getSchema(), buf());

        assertEquals(parsedBars, bars);
        assertEquals(baz, parsedBaz);
    }

It seems to work, but I'm not sure if it's the best approach. Calling 
writeDelimitedTo() multiple times might also be bad for performance?

Original comment by nev...@gmail.com on 29 Aug 2012 at 9:39

GoogleCodeExporter commented 9 years ago
If you are streaming, then using tail-delimiters and re-using the CodedInput 
and ProtostuffOutput will be the best approach if you want the best 
performance.  (Simply create a utility class inside the protostuff package to 
access the package-private members)

Otherwise, calling writeDelimitedTo multiple times will be sufficient for your 
needs.

Original comment by david.yu...@gmail.com on 29 Aug 2012 at 10:10