danielgtaylor / python-betterproto

Clean, modern, Python 3.6+ code generator & library for Protobuf 3 and async gRPC
MIT License
1.51k stars 214 forks source link

Parsing a protobuf FromString raises KeyError: 'uint32' #82

Closed Gobot1234 closed 4 years ago

Gobot1234 commented 4 years ago

Hello, I am trying to parse a protobuf that was received from a websocket.

@dataclass
class CMsgClientLogon(betterproto.Message):
    protocol_version: int = betterproto.uint32_field(1)
    deprecated_obfustucated_private_ip: int = betterproto.uint32_field(2)
    cell_id: int = betterproto.uint32_field(3)
    last_session_id: int = betterproto.uint32_field(4)
    client_package_version: int = betterproto.uint32_field(5)
    client_language: str = betterproto.string_field(6)
    client_os_type: int = betterproto.uint32_field(7)
    should_remember_password: bool = betterproto.bool_field(8)
    wine_version: str = betterproto.string_field(9)
    deprecated_10: int = betterproto.uint32_field(10)
    obfuscated_private_ip: "CMsgIPAddress" = betterproto.message_field(11)
    deprecated_public_ip: int = betterproto.uint32_field(20)
    qos_level: int = betterproto.uint32_field(21)
    client_supplied_steam_id: float = betterproto.fixed64_field(22)
    public_ip: "CMsgIPAddress" = betterproto.message_field(23)
    machine_id: bytes = betterproto.bytes_field(30)
    launcher_type: int = betterproto.uint32_field(31)
    ui_mode: int = betterproto.uint32_field(32)
    chat_mode: int = betterproto.uint32_field(33)
    steam2_auth_ticket: bytes = betterproto.bytes_field(41)
    email_address: str = betterproto.string_field(42)
    rtime32_account_creation: float = betterproto.fixed32_field(43)
    account_name: str = betterproto.string_field(50)
    password: str = betterproto.string_field(51)
    game_server_token: str = betterproto.string_field(52)
    login_key: str = betterproto.string_field(60)
    was_converted_deprecated_msg: bool = betterproto.bool_field(70)
    anon_user_target_account_name: str = betterproto.string_field(80)
    resolved_user_steam_id: float = betterproto.fixed64_field(81)
    eresult_sentryfile: int = betterproto.int32_field(82)
    sha_sentryfile: bytes = betterproto.bytes_field(83)
    auth_code: str = betterproto.string_field(84)
    otp_type: int = betterproto.int32_field(85)
    otp_value: int = betterproto.uint32_field(86)
    otp_identifier: str = betterproto.string_field(87)
    steam2_ticket_request: bool = betterproto.bool_field(88)
    sony_psn_ticket: bytes = betterproto.bytes_field(90)
    sony_psn_service_id: str = betterproto.string_field(91)
    create_new_psn_linked_account_if_needed: bool = betterproto.bool_field(92)
    sony_psn_name: str = betterproto.string_field(93)
    game_server_app_id: int = betterproto.int32_field(94)
    steamguard_dont_remember_computer: bool = betterproto.bool_field(95)
    machine_name: str = betterproto.string_field(96)
    machine_name_userchosen: str = betterproto.string_field(97)
    country_override: str = betterproto.string_field(98)
    is_steam_box: bool = betterproto.bool_field(99)
    client_instance_id: int = betterproto.uint64_field(100)
    two_factor_code: str = betterproto.string_field(101)
    supports_rate_limit_response: bool = betterproto.bool_field(102)
    web_logon_nonce: str = betterproto.string_field(103)
    priority_reason: int = betterproto.int32_field(104)
    embedded_client_secret: "CMsgClientSecret" = betterproto.message_field(105)

>>> CMsgClientLogon.FromString(b"\x8a\x15\x00\x80\t\x00\x00\x00\t\xc2L'\x11\x01\x00\x10\x01\x08\xac\x80\x048\xc4\xfa\xff\xff\x0f\xa8\x01\x02\x80\x02\x04\x88\x02\x02\x92\x03\ngobot12342\xa0\x06\x00\xba\x068gvvcXnEbE1YAAAAAAAAAAAAAAAAAAAAAAwDUJifU4Ce9czLf/NN85VZV")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/James/PycharmProjects/steam.py/venv/lib/python3.7/site-packages/betterproto/__init__.py", line 773, in FromString
    return cls().parse(data)
  File "/Users/James/PycharmProjects/steam.py/venv/lib/python3.7/site-packages/betterproto/__init__.py", line 754, in parse
    parsed.wire_type, meta, field, parsed.value
  File "/Users/James/PycharmProjects/steam.py/venv/lib/python3.7/site-packages/betterproto/__init__.py", line 695, in _postprocess_single
    fmt = _pack_fmt(meta.proto_type)
  File "/Users/James/PycharmProjects/steam.py/venv/lib/python3.7/site-packages/betterproto/__init__.py", line 282, in _pack_fmt
    }[proto_type]
KeyError: 'uint32'

I'm not too sure what causes this issue, not too sure if it is the dataclass that I am using or a library error so I thought I better ask.

Version Info

Python Version: 3.7.5 Betterproto: 1.2.5

Thanks 👍

Gobot1234 commented 4 years ago

Doing the same with the response also raises a similar error KeyError: 'int32' which sort of makes me think that it's a proto error.

nat-n commented 4 years ago

Thanks for reporting this. To investigate further it would be helpful if you could also provide a sample payload as JSON that produces the problem, and maybe also the proto definition (ideally a more minimal example) to help compose a failing test case similar to the others we've collected: https://github.com/danielgtaylor/python-betterproto/tree/master/betterproto/tests#test-case-directory-structure

In the mean time, is it possible that this is a protobuf versioning problem? Have any fields changed order or type?

Gobot1234 commented 4 years ago

I don't think I can provide a JSON as the messages are just pulled from chrome's websocket viewer. I'm rather new to this proto thing so if you have any clue how I could get a JSON payload that would be great.

The protobufs that these were compiled from can be found at https://github.com/SteamDatabase/Protobufs/blob/master/steam/steammessages_clientserver_login.proto

I will try and recompile them, and see if that changes anything though.

Thanks.

Gobot1234 commented 4 years ago

Recompiling didn't change anything.

boukeversteegh commented 4 years ago

Error seems to happen here:

https://github.com/danielgtaylor/python-betterproto/blob/eec24e4ee85c581d5185c258d2ea2cbb4d39be20/betterproto/__init__.py#L278-L287

⚠ KeyError: 'uint32'


The Bytes from the stacktrace should serve well enough as test-case data

b"\x8a\x15\x00\x80\t\x00\x00\x00\t\xc2L'\x11\x01\x00\x10\x01\x08\xac\x80\x048\xc4\xfa\xff\xff\x0f\xa8\x01\x02\x80\x02\x04\x88\x02\x02\x92\x03\ngobot12342\xa0\x06\x00\xba\x068gvvcXnEbE1YAAAAAAAAAAAAAAAAAAAAAAwDUJifU4Ce9czLf/NN85VZV"

At first glance, it seems a simple fix, just add the TYPE to the array with correct binary value, but who knows :-)

Gobot1234 commented 4 years ago

I'll try that and see what happens

Gobot1234 commented 4 years ago

That appeared to fix it

nat-n commented 4 years ago

I had a go at debugging this and it looks like what's happening is that the parse_fields function is deducing from the input data that the field's wire type should be 64bit value (specifically it produces ParsedField(number=1, wire_type=1, value=b"\xc2L'\x11\x01\x00\x10\x01", raw=b"\t\xc2L'\x11\x01\x00\x10\x01") ) despite the fact that the field is declared as uint32.

How this discrepancy arises or what to make of it, I don't know... I guess it means there's a mismatch between the number of bytes to be unpacked and the method of unpacking to apply?

Whether it would be valid to proceed to unpack the result as an int32 anyway is also not clear to me, but the case could clearly be handled better than raising a KeyError.

As for the root cause, the google grpcio implementation also chokes on this example with:

RuntimeWarning: Unexpected end-group tag: Not all data was converted

Which makes me suspect there is actually either some kind of data corruption or more likely a breaking change in the protobuf between the proto file in the repo and that data sample.

boukeversteegh commented 4 years ago

Great work investigating!

How this discrepancy arises or what to make of it, I don't know... I guess it means there's a mismatch between the number of bytes to be unpacked and the method of unpacking to apply?

The docs also lists specifically which types are interchangeable (for example, when you upgrade a message type from int32 to int64). From this list it is shown that int32 cannot be upgraded to fixed64 or vise-versa.

Then google grpcio also doesn't handle this case correctly.

It seems rather that the sending server has updated its format, and the client needs to update as well. I've tried to look through https://github.com/SteamDatabase/Protobufs/blame/master/steam/steammessages_clientserver_login.proto for any fields that had its type changed, but couldn't find any (its a big file!).

If we can pinpoint a commit on the source proto that shows an incompatible change was made, we know for sure this is not a bug in protobuf/betterproto.

👉 Do we know which message and field is actually causing the error?

nat-n commented 4 years ago

@boukeversteegh I forgot to report which field it was before I cleaned up my debug setup, but as the ParsedField object shows it's a uint32 field number 1 of its containing object, and wasn't the first or last part of the message so one of:

Gobot1234 commented 4 years ago

I think that the header on the message is messing with it. This therefore isn't an issue with the library.