MarshalX / atproto

The AT Protocol (🦋 Bluesky) SDK for Python 🐍
https://atproto.blue
MIT License
342 stars 34 forks source link

ValueError: Failed to read CAR header. Invalid uvarint #438

Open adonoho opened 6 hours ago

adonoho commented 6 hours ago
Traceback (most recent call last):
  File "/Users/awd/miniforge3/envs/bsky/lib/python3.10/site-packages/atproto_firehose/client.py", line 221, in _process_message_frame
    await self._on_message_callback(frame)
  File "/Users/awd/Projects/PhenomML/BSky2GBQ/main.py", line 91, in async_on_message
    blocks = atproto.CAR.from_bytes(message.blocks).blocks
  File "/Users/awd/miniforge3/envs/bsky/lib/python3.10/site-packages/atproto_core/car/car.py", line 51, in from_bytes
    header, blocks = libipld.decode_car(data)
ValueError: Failed to read CAR header. Invalid uvarint

Happy to help and dig deeper.

Anon, Andrew

adonoho commented 5 hours ago

# <== is the marker to the line that generated this ValueError.

async def async_on_message(message, test_function, handler):
    message = hose.parse_subscribe_repos_message(message)
    if isinstance(message,
                  atproto.models.ComAtprotoSyncSubscribeRepos.Commit):
        blocks = atproto.CAR.from_bytes(message.blocks).blocks  # <== This is the line that creates the above trace.
        for op in message.ops:
            uri = atproto.AtUri.from_str("at://" + message.repo \
                                         + "/" + op.path)
            raw = blocks.get(op.cid)
            if raw:
                record = get_or_create(raw, strict=False)
                if record.py_type is not None:
                    rdict = record.model_dump()
                    item = {
                        "repo": message.repo,
                        "revision": message.rev,
                        "sequence": message.seq,
                        "timestamp": message.time,
                        "action": op.action,
                        "cid": str(op.cid),
                        "path": op.path,
                        "collection": uri.collection,
                        "record": rdict
                    }
                    if test_function(item):
                        await IOLoop.current().run_in_executor(None, handler, item)
MarshalX commented 5 hours ago

Hi! Today is the day when everyone tries devil payloads over the network. These are special malformed messages that raise issues like that. It would help if you would catch the value of .blocks and write to the file. And pls share. Otherwise, just catch the ValueError exception and ignore this commit

adonoho commented 3 hours ago

OK, I'll try to get to this after dinner.