alexeyxo / protobuf-swift

Google ProtocolBuffers for Apple Swift
http://protobuf.io/#swift
Apache License 2.0
937 stars 138 forks source link

Problem parsing large messages (i.e. 1MB) #151

Open gegles opened 8 years ago

gegles commented 8 years ago

Version of protoc (protoc --version)

3.0

Version of ProtocolBuffers.framework

ProtoBuf3.0-Swift3.0 branch (but most likely all the branches)

.proto file to reproduce

message Data {
    uint64              offset              = 1;
    bool                last                = 2;
    bytes               data                = 3;
}

Description

If this input.read is not able to read the full rSize of the message but only a subset, the function still returns a message with only the partial data.

At a minimum, this should give an error or a way for users to detect the issue, at best, a mechanism to progressively build the message (in multiple reads) should be provided.

Let me know I am completely off here or if more details are needed.

alexeyxo commented 8 years ago

I need some details about this. How do you use this method? Do you have UnitTest or example?

gegles commented 8 years ago

I will try to give you more details or spend time building a UT for it.

In the meantime, understand that the problem is very localized to the input.read call I am referring to.

Since you do not check/use the return value (the actual number of bytes read or 0 (EOF) or -1 (error)), you basically assume input.read will always successfully read the full "maxLength" number of bytes. That is not always the case as explained in the InputStream.read() doc.

alexeyxo commented 8 years ago

Can you try last version of protobuf-swift repository?

gegles commented 8 years ago

Thanks for bring the codebase up to speed with Swift 3.0, I've already moved to it.

I haven't yet tried with large messages, I will soon, but I don't see how anything could be different, since the return value of inputStream.read() is still not being checked.

alexeyxo commented 8 years ago

You talk about "length-delimited encoding/decoding"... If your file/stream has a correct format, then all will be ok. Please look test case:

    func testDelimitedEncodingEncoding()
    {
         do {
            let str =  (NSBundle(forClass:TestUtilities.self).resourcePath! as NSString).stringByAppendingPathComponent("delimitedFile.dat")
            let stream = NSOutputStream(toFileAtPath: str, append: false)
            stream?.open()
            for i in 1...100 {
                let mes = try PBProtoPoint.Builder().setLatitude(1.0).setLongitude(Float(i)).build()
                try mes.writeDelimitedToOutputStream(stream!)
            }
            stream?.close()

            let input = NSInputStream(fileAtPath: str)
            input?.open()
            let message = try PBProtoPoint.parseArrayDelimitedFromInputStream(input!)
            XCTAssertTrue(message[1].longitude == 2.0, "")
        }
        catch
        {
            XCTFail("Fail testDelimitedEncodingEncoding")
        }
    }

I compare return value with "!= 1", then return nil(== Invalid ProtocolBuffers stream) value.

gegles commented 8 years ago

Alex,

Maybe my understanding is still off, but I do think there is a clear problem here. I think your test is only succeeding because your file-based InputStream is able to keep up and always read the "maxLength" requested fully...

Let me try to describe it better.

I have a very simple "Data" message (simplified here):

message DataMessage {
    bytes               data                = 1;
}

The code I use to read/receive this message is like this:

    func processMessage() {
        do {
            if let msg = try DataMessage.parseDelimitedFrom(inputStream: inputStream!) {
                onMessage?(msg)
            }
            print("No more message for now")
        } catch {
            print("Parsing error: \(error)")
        }
    }

Important: My InputStream is a TCP connection here (Not a filesystem-based IO)

When the message/data size is relatively small (1K to 30K), things work just fine....

When the size gets bigger, here is what happens,

1) A Message is still produced by parseDelimitedFrom but the data is clearly not fully there.

2) If I modify your AbstractMessage.mergeDelimitedFrom() function like this:

    //Delimited Encoding/Decoding
    public func mergeDelimitedFrom(inputStream: InputStream) throws -> Self? {
        var firstByte:UInt8 = 0
        if inputStream.read(&firstByte, maxLength: 1) != 1 {
            return nil
        }
        let rSize = try CodedInputStream.readRawVarint32(firstByte: firstByte, inputStream: inputStream)
        let data  = Data(bytes: [0],count: Int(rSize))
        let pointer = UnsafeMutablePointer<UInt8>((data as NSData).bytes)

    // Original
    // inputStream.read(pointer, maxLength: Int(rSize))

        // Now just printing the return value if different then rSize
    let actuallyReadSize = inputStream.read(pointer, maxLength: Int(rSize))

    if (actuallyReadSize != Int(rSize)) {
        print("DANGER: actuallyReadSize(\(actuallyReadSize)) != rSize(\(rSize))")
    }
        return try mergeFrom(data: data)
    }

I do see the following printout: DANGER: actuallyReadSize(400729) != rSize(1048594)

3) Subsequent message parsing is screwed up as the current inputStream is still receiving data from the previous message.....

Yes, clearly from this, you do the proper checking "!= 1" for the "firstByte" reading. No issue with that.

The problem is that you do no such check on the second inputStream.read() and there is no guarantee that the full data can be read (in one shot)..... My understanding is that InputStream.read() is a non-blocking call that will only return what it has read (up to maxLength).

Let me know if this makes it clearer or if I am still missing something. I will see how I can come up with a solution, but I don't know your codebase too well, so it may take some time.

Cheers. G.

gegles commented 8 years ago

@alexeyxo could you just confirm to me that you too consider this as a bug?

gegles commented 7 years ago

FYI, I still "know" that there is a bug here, but for now, instead of decoding the message directly from the socket InputStream, I first make sure I read all the data for the message (into a Data buffer) and only then use CodedInputStream....

thandang commented 7 years ago

Hi @gegles How can you read data into Data buffer?

Kind regards,

thandang commented 7 years ago

I had tried something like `let rawInput:NSInputStream = NSInputStream(data: dataResponse) rawInput.open()

        var buffer = [UInt8](count: dataResponse.length, repeatedValue: 0)

        while rawInput.hasBytesAvailable {
            let len = rawInput.read(&buffer, maxLength: buffer.count)
            if len > 0 {
                let outData = NSData(bytes: &buffer, length: buffer.count)
                let codedInputStream = CodedInputStream(data: outData)

                let lengthRaw = try codedInputStream.readRawVarint32()
                let lengthThrough = try codedInputStream.pushLimit(lengthRaw)
                protobufMessage = try Com.Celertech.Baseserver.Communication.Protobuf.ProtobufMessage.parseFromCodedInputStream(codedInputStream)
                codedInputStream.popLimit(lengthThrough)
            }
        }`

But still like it doesn't work. The protobufMessage sill be nil

Do you have any idea? Cheers.