capnproto / capnproto-java

Cap'n Proto in pure Java
Other
391 stars 86 forks source link

C++ implementation is more strict when reading StructPointers #122

Closed mdindoffer closed 2 years ago

mdindoffer commented 2 years ago

I have this scrambled message:

00 00 00 00 07 00 00 00
00 00 00 00 AF 00 00 00
00 00 00 00 05 00 01 00
CA 59 82 59 70 10 A7 C0
76 A0 D8 6E 0F 6D C5 B8
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
1A 00 01 00 00 00 00 00

When I try to parse it with the C++ implementation (0.9.1) I get an exception:

$ capnp convert --verbose binary:binary < capnp-bad.bin 
*** WARNING ***
The input data does not appear to be the type that you specified.  I'll try
to parse it anyway, but if it doesn't look right, please verify that you
have the right type.  Use --quiet to suppress this warning.
*** END WARNING ***
����*** ERROR CONVERTING PREVIOUS MESSAGE ***
The following error occurred while converting the message above.
This probably means the input data is invalid/corrupted.
Exception description: expected boundsCheck(segment, ptr, ref->structRef.wordSize()); Message contained out-of-bounds struct pointer.
Code location: capnp/layout.c++:2182
*** END ERROR ***

However, capnp-java happily parses the message producing garbage data.

Looking at readStructPointer it seems to me that the java implementation only checks for the global readLimit set in ReaderOptions to prevent DoS attacks, but lacks some additional bound checks that the C++ implementation has.

Anyway, these two implementations are not binary-compatible.

mdindoffer commented 2 years ago

@dwrensha This is causing problems for us in production because garbage messages get interpreted as real data. Could you please take a look?

dwrensha commented 2 years ago

capnproto-java deliberately omits some of the bounds checks of capnproto-c++ because ByteBuffer already has built-in bounds checking. It's not surprising that this causes capnproto-java to accept some messages that capnproto-c++ rejects.

Is this causing problems for code that's using capnproto or just for code that's fuzzing capnproto? Can you describe in more depth how it's causing problems?

mdindoffer commented 2 years ago

Is this causing problems for code that's using capnproto or just for code that's fuzzing capnproto?

The former.

We have a capnproto client that uses the reference C++ implementation to send data to our Java-based server. I don't want to share our private protocol schema publicly, so please bear with me and my vague description here. We have a parent struct that contains a union of "stuff" - a couple of groups and Bool fields.

The client tried to send us a message with one of these Bool fields se to true. Unfortunately some kind of memory corruption on the client's side produced the scrambled message in the OP. If it were parsed by the C++ library, it would have been rejected on delivery, because it is an invalid message.

Since capnproto-java doesn't have these bound checks implemented it was not rejected. The message was parsed using a chain of if-else statements like this:

if (parentStructMsg.isMessageGroupTypeA()) {
    MessageGroupTypeA.Reader messageTypeA = parentStructMsg.getMessageGroupTypeA();
    //do stuff
} else {
    if (parentStructMsg.isBoolMsg1()) {
        boolean bmsg1 = parentStructMsg.getBoolMsg1();
        //do other stuff
    } else if (parentStructMsg.isBoolMsg2()) {
        boolean bmsg2 = parentStructMsg.getBoolMsg2();
        //do other stuff
    }
}

The isMessageGroupTypeA call returned true and the packet was deserialized as a message of type messageGroupTypeA leading to all kinds of problems on server side (because the payload of the group made no sense from business perspective). Basically random noise got interpreted as real data.

Is this of any help?

paxel commented 2 years ago

a struct in a union. you probably have to call (parentStructMsg.isMessageGroupTypeA() && parentStructMsg.hasMessageGroupTypeA()) the is will check the data field, and has checks the pointer. if one of them is corrupt you will notice.

you probably should also add an Void unset @0;

but in all honesty. the customer should fix his memory issue. garbage in garbage out.

cheers 🍻

mdindoffer @.***> schrieb am Do., 27. Jan. 2022, 19:12:

Is this causing problems for code that's using capnproto or just for code that's fuzzing capnproto?

The former.

We have a capnproto client that uses the reference C++ implementation to send data to our Java-based server. I don't want to share our private protocol schema publicly, so please bear with me and my vague description here. We have a parent struct that contains a union of "stuff" - a couple of groups and Bool fields.

The client tried to send us a message with one of these Bool fields se to true. Unfortunately some kind of memory corruption on the client's side produced the scrambled message in the OP. If it were parsed by the C++ library, it would have been rejected on delivery, because it is an invalid message.

Since capnproto-java doesn't have these bound checks implemented it was not rejected. The message was parsed using a chain of if-else statements like this:

if (parentStructMsg.isMessageGroupTypeA()) { MessageGroupTypeA.Reader messageTypeA = parentStructMsg.getMessageGroupTypeA(); //do stuff } else { if (parentStructMsg.isBoolMsg1()) { boolean bmsg1 = parentStructMsg.getBoolMsg1(); //do other stuff } else if (parentStructMsg.isBoolMsg2()) { boolean bmsg2 = parentStructMsg.getBoolMsg2(); //do other stuff } }

The isMessageGroupTypeA call returned true and the packet was deserialized as a message of type messageGroupTypeA leading to all kinds of problems on server side (because the payload of the group made no sense from business perspective). Basically random noise got interpreted as real data.

Is this of any help?

— Reply to this email directly, view it on GitHub https://github.com/capnproto/capnproto-java/issues/122#issuecomment-1023508067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLHGBPZWPUHD4YOGRC4NDUYGDJLANCNFSM5MSLQVYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

paxel commented 2 years ago

it's important that the memory is zeroed out. Java makes sure that this is the case, not so sure about c++

Patrick Zimmer @.***> schrieb am Do., 27. Jan. 2022, 19:53:

a struct in a union. you probably have to call (parentStructMsg.isMessageGroupTypeA() && parentStructMsg.hasMessageGroupTypeA()) the is will check the data field, and has checks the pointer. if one of them is corrupt you will notice.

you probably should also add an Void unset @0;

but in all honesty. the customer should fix his memory issue. garbage in garbage out.

cheers 🍻

mdindoffer @.***> schrieb am Do., 27. Jan. 2022, 19:12:

Is this causing problems for code that's using capnproto or just for code that's fuzzing capnproto?

The former.

We have a capnproto client that uses the reference C++ implementation to send data to our Java-based server. I don't want to share our private protocol schema publicly, so please bear with me and my vague description here. We have a parent struct that contains a union of "stuff" - a couple of groups and Bool fields.

The client tried to send us a message with one of these Bool fields se to true. Unfortunately some kind of memory corruption on the client's side produced the scrambled message in the OP. If it were parsed by the C++ library, it would have been rejected on delivery, because it is an invalid message.

Since capnproto-java doesn't have these bound checks implemented it was not rejected. The message was parsed using a chain of if-else statements like this:

if (parentStructMsg.isMessageGroupTypeA()) { MessageGroupTypeA.Reader messageTypeA = parentStructMsg.getMessageGroupTypeA(); //do stuff } else { if (parentStructMsg.isBoolMsg1()) { boolean bmsg1 = parentStructMsg.getBoolMsg1(); //do other stuff } else if (parentStructMsg.isBoolMsg2()) { boolean bmsg2 = parentStructMsg.getBoolMsg2(); //do other stuff } }

The isMessageGroupTypeA call returned true and the packet was deserialized as a message of type messageGroupTypeA leading to all kinds of problems on server side (because the payload of the group made no sense from business perspective). Basically random noise got interpreted as real data.

Is this of any help?

— Reply to this email directly, view it on GitHub https://github.com/capnproto/capnproto-java/issues/122#issuecomment-1023508067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLHGBPZWPUHD4YOGRC4NDUYGDJLANCNFSM5MSLQVYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

mdindoffer commented 2 years ago

@paxel

a struct in a union.

It's a union of groups and Bools inside of a struct. Not the other way around.

(parentStructMsg.isMessageGroupTypeA() && parentStructMsg.hasMessageGroupTypeA())

There is no hasMessageGroupTypeA method in the generated Reader, only the isMessageGroupTypeA.

but in all honesty. the customer should fix his memory issue.

With all due respect - that's irrelevant. We need to be better protected in case something like this happens again. I've never seen a protocol that parses a malformed message without warnings and calls it expected behaviour, have you?

paxel commented 2 years ago

I am sorry. you're absolutely right.

mdindoffer @.***> schrieb am Do., 27. Jan. 2022, 21:22:

@paxel https://github.com/paxel

a struct in a union.

It's a union of groups and Bools inside of a struct. Not the other way around.

(parentStructMsg.isMessageGroupTypeA() && parentStructMsg.hasMessageGroupTypeA())

There is no hasMessageGroupTypeA method in the generated Reader, only the isMessageGroupTypeA.

but in all honesty. the customer should fix his memory issue.

With all due respect - that's irrelevant. We need to be better protected in case something like this happens again. I've never seen a protocol that parses a malformed message without warnings and calls it expected behaviour, have you?

— Reply to this email directly, view it on GitHub https://github.com/capnproto/capnproto-java/issues/122#issuecomment-1023607141, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLHGBTAU6DP6DPYQ7AYCTUYGSQZANCNFSM5MSLQVYQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

dwrensha commented 2 years ago

There's no way to reliably detect garbage data in all cases. I expect that if the incoming message was the following (one byte different from the example that @mdindoffer provided), then both capnproto-c++ and capnproto-java would accept it, and it would still cause problems for your server:

00 00 00 00 07 00 00 00
00 00 00 00 0A 00 00 00
00 00 00 00 05 00 01 00
CA 59 82 59 70 10 A7 C0
76 A0 D8 6E 0F 6D C5 B8
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
1A 00 01 00 00 00 00 00

I agree that more validation could be beneficial and would make such problems less likely, but it does come at the cost of slightly more complicated code and extra cpu cycles.

So: adding the extra validation seems potentially reasonable to me, but not high-urgency.