MarshalX / atproto

The AT Protocol (🦋 Bluesky) SDK for Python 🐍
https://atproto.blue
MIT License
391 stars 42 forks source link

Segmentation fault (core dumped) in CAR reading from Firehose #437

Open mockthebear opened 1 week ago

mockthebear commented 1 week ago

i was consuming data from the firehouse, specifically at cursor: 3720770000, every time i get a consistent Core Dumped

I managed to find the crash is coming from: .venv/lib/python3.12/site-packages/atproto_firehose/client.py: frame = Frame.from_bytes(data)

If i print the data i get:

b':\xa2eroots\x81\xd8*X%\x00\x01q\x12 \xe2\x17h\xdaQ\xedJp2\x88\xd9\xf9 xr8\xee\x81yB\x08\xec_c\x918\xc6\xe9\x1e\xe6KDgversion\x01\x81\x01\x01q\x12 x\xb7\x05~\xd0>\xe02\xc9\xa17\x8b:\xb9\x1d\xc8fe"&\xc5\x96fP\x1d\xc8\xcf\xca\xce\xe6|b\xa2ae\x81\xa4akX app.bsky.feed.post/3lb4rbqitunycap\x00at\xf6av\xd8*X%\x00\x01q\x12 tD(%\x02\xe0\xf8\x90\xc7\x08\xd9\x06\xb38C+CK\xfd@\xf6\xd9\xad]\'\tA\x92\x8cul\xc8al\xf6\xd1\x01\x01q\x12 r\x9b\x1c{\xfd\xf2\x8f\x85\x00\x16\xce\xd3>\xaaW\x1a\x8f|\xbd\x14\xd5\x00Y=\x85 \xfbd\xf3\xd6\xc5\xd4\xa2ae\x81\xa4akX app.bsky.feed.like/3lb4h5oxwf32sap\x00at\xd8*X%\x00\x01q\x12 D`\xa9\xed\xd9\xd2\x8d\xc6\x15-c\x18G8\x07\x1a\xda{\xf4n]\xa5U\xb2\xacP\x87\xc4NX[Uav\xd8*X%\x00\x01q\x12 B\x0c\x0eGqa\xb20 |\x88N\x90\x1a\x14-aY\xde\x05p\xc6\x1f\x18[C4\xab-\x1a\x00nal\xd8*X%\x00\x01q\x12 \xa2(\rF_\x00\xd3\xcc\xc72\x1ah3\xd6\xf2OZ|\x7fi\x8c\xc16I\x8e\x9fWg#*\x9c\xc2\xad\x8e\x06\x01q\x12 tD(%\x02\xe0\xf8\x90\xc7\x08\xd9\x06\xb38C+CK\xfd@\xf6\xd9\xad]\'\tA\x92\x8cul\xc8\xa5devil\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81

<From now on, we get about 403101 \x81 characters repeated>

x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x81\x80dtextuevilness level 100000e$typerapp.bsky.feed.postelangs\x81benicreatedAtx\x182024-11-17T06:21:27.500Z\xe0\x01\x01q\x12 \xe2\x17h\xdaQ\xedJp2\x88\xd9\xf9 xr8\xee\x81yB\x08\xec_c\x918\xc6\xe9\x1e\xe6KD\xa6cdidx did:plc:t423oqsrtl5gtgdfxvye5gcocrevm3lb4rbqitunyccsigX@\x11L\x10\xa6\xce\xac\xc5];\xb4\xc1m\x81\xa1\x0c\x0b\x01\xc3N\x9a\xef!\x1apB\'\x17\xbc\xfaM\xc4\x16n^\xe7Mk\xc0\xb8\xb5S\xa4\x08S\xb4\xb1=\xac\n~\x87\xf4QYd\ro\x11\x8b\x8e\xab\xd1\x19\xa7ddata\xd8*X%\x00\x01q\x12 r\x9b\x1c{\xfd\xf2\x8f\x85\x00\x16\xce\xd3>\xaaW\x1a\x8f|\xbd\x14\xd5\x00Y=\x85 \xfbd\xf3\xd6\xc5\xd4dprev\xf6gversion\x03\x88\x05\x01q\x12 D`\xa9\xed\xd9\xd2\x8d\xc6\x15-c\x18G8\x07\x1a\xda{\xf4n]\xa5U\xb2\xacP\x87\xc4NX[U\xa2ae\x86\xa4akX app.bsky.feed.like/3lb4h5tvywtusap\x00at\xd8*X%\x00\x01q\x12 l\x1aD8x\xcan\x11\x10\x19\xf6\x01\xcc*\xb1\xf8f{\x84\x9e\x8e\xca\xbe\x86\n\xc3\xee\x002v*&av\xd8*X%\x00\x01q\x12 41{a\x8e_hb*\x94\xe3n\x13\x7f\xe1T\x02\xf2\xab\x85\x82\x95\xe7"\xc3\xab\x0e\x19\xabv\xf3[\xa4akRpost/3lb4cestkrc2gap\x0eat\xd8*X%\x00\x01q\x12 9\x92\x83\xbb\xf7e\x99@_\xea\x1d\x93"\x1fp\x86z\xa3\xb3\x8db[81\x8aE^\xd6\x8bp\x95lav\xd8*X%\x00\x01q\x12 \x10\xea\xc7\x94\xbc{OX,J\xd3e\x9f\xbfR-\xa4\xd5\xa4\x8a\x8a\xeb\xa5\x93\x9e\xab\x94\xf0\x80aV4\xa4akIdmfuuxk2kap\x17at\xd8*X%\x00\x01q\x12 \xb8\x7f\xc39\x82F\xa0Ud1\xa8EVZa\xb0O\xf9z\x07O\xe9\x08\xb8\x84&\xc4q\xa0\n\x89\xeaav\xd8*X%\x00\x01q\x12 \x17\xd4\x17\x93\xb1\xbfb\x03>AN\x13<+bL\xd6]\xd5\xd8\x0fI\xf2\x98\r\x85\xfe/\xd4\xee\x02\xa1\xa4akIiydio6s2sap\x17at\xf6av\xd8*X%\x00\x01q\x12 /\x152\x01l\xa5\x96\xa1^G\x02\xddS\xc6\xf9p#\x19\x91*\n2\x0f\xa4g\xe1\x14iK\x18.\x8b\xa4akIjnrfiik2sap\x17at\xd8*X%\x00\x01q\x12 h\x93V\x8bH\xb2\x8c\x1e\xd3G_\xbd\xdf"%VB\xb5\xa5@\x8c\x05(M!MbF`\x1a\xafZav\xd8*X%\x00\x01q\x12 \xf5\xaf.g\xe0e\xd4K\xb4"w\xb2\xef)7\xf3\xef~1\x9f\xb0*\xbbR|\xd7\x9f\xfc\xe5e\x88\xcf\xa4akIrbd6sg54cap\x17at\xd8*X%\x00\x01q\x12 x\xb7\x05~\xd0>\xe02\xc9\xa17\x8b:\xb9\x1d\xc8fe"&\xc5\x96fP\x1d\xc8\xcf\xca\xce\xe6|bav\xd8*X%\x00\x01q\x12 %\x11\x81\x87\xb8u[\x16::O\xa5\x9cO\xdc\xb1O\x1d_\xd9U+V\xbd\x8bl\xcc\xf4D\xdf\xeb\xcdal\xf6'

There is no verification of invalid data or too big data, causing the crash

MarshalX commented 1 week ago

Super duper interesting. Probably fails on https://github.com/MarshalX/python-libipld side.

Could you please export problematic data bytes to the file? And prepare reproducible example smth like with open(f.bin.....) as f: Frame.from_bytes(f.read())....

It will help a lot! Thank you

mockthebear commented 1 week ago

Does this help?

from typing import Union
from atproto import CAR
import ast

def read_binary_dump(file_path: str) -> Union[bytes, bytearray]:
    with open(file_path, "r") as file:
        human_readable_string = file.read().strip()

    binary_data = ast.literal_eval(human_readable_string)

    if isinstance(binary_data, (bytes, bytearray)):
        return binary_data
    else:
        raise ValueError("The parsed data is not bytes or bytearray.")

file_path = "crashy.txt"
data: Union[bytes, bytearray] = read_binary_dump(file_path)

CAR.from_bytes(data)

crashy.txt

# python3 rd.py
Segmentation fault (core dumped)
MarshalX commented 1 week ago

@mockthebear thank you! Super strange. I run repos firehose from 3720770000 cursor. Nothing happens locally for a 3+ min. How much should I wait to reach problematic frame?

Speaking of your example it gives errors, but ast and literal_eval scares me. So I just saved it in non-human readable format like this:

with open('crashy.txt', 'r') as file:
    binary_data = ast.literal_eval(file.read().strip())
    with open('data.bin', 'wb') as file:
        file.write(binary_data)

and now it does not segfaults, but gives proper errors about wrong varints, etc

this is the code that I use to reproduce with cursor (pls run locally):

from atproto import models, FirehoseSubscribeReposClient, firehose_models, parse_subscribe_repos_message

client = FirehoseSubscribeReposClient(models.ComAtprotoSyncSubscribeRepos.Params(cursor=3720770000))

def on_message_handler(message: firehose_models.MessageFrame) -> None:
    _ = parse_subscribe_repos_message(message)
    print(message.header)

client.start(on_message_handler)
MarshalX commented 1 week ago

okay, found thanks to discord

image

image

@DavidBuchanan314 executing the reported issue https://github.com/MarshalX/python-libipld/issues/9

mockthebear commented 1 week ago

Usually a few seconds, it never passes trough the xxx7000 to xxx8000

Sorry for that, i'm not as experienced in python, so i asked for gpt to parse that output for me x.x

I see i see. Maybe because the machine i'm running it has only 1gb of ram left it happens?

The code i'm running is this one and it crashes on this line: https://github.com/MarshalX/bluesky-feed-generator/blob/main/server/data_stream.py#L19

Changing just a bit your code, i get the crash:

from atproto import models, FirehoseSubscribeReposClient, firehose_models, parse_subscribe_repos_message, CAR

client = FirehoseSubscribeReposClient(models.ComAtprotoSyncSubscribeRepos.Params(cursor=3720770000))

def on_message_handler(message: firehose_models.MessageFrame) -> None:
    commit = parse_subscribe_repos_message(message)
    print(message.header)
    CAR.from_bytes(commit.blocks)

client.start(on_message_handler)
MessageFrameHeader(op=1, t='#commit')
MessageFrameHeader(op=1, t='#commit')
MessageFrameHeader(op=1, t='#commit')
MessageFrameHeader(op=1, t='#commit')
MessageFrameHeader(op=1, t='#commit')
MessageFrameHeader(op=1, t='#commit')
MessageFrameHeader(op=1, t='#commit')
MessageFrameHeader(op=1, t='#commit')
Segmentation fault (core dumped)

It takes about 9889 lines of 'MessageFrameHeader' to get to the problematic message

MarshalX commented 1 week ago

Reproduced, thank you! Known edge case, but this is first time when someone exploits it in the network

mockthebear commented 1 week ago

Aparently the post came from https://bsky.app/profile/david.dev.retr0.id

Li-WeiCheng commented 1 week ago

Hi,

I encountered the same issue.

Currently, my approach is to skip the data/cursor after Segmentation Fault occurs.

Is there a better solution for this?

Thanks :)

mockthebear commented 1 week ago

Hi,

I encountered the same issue.

Currently, my approach is to skip the data/cursor after Segmentation Fault occurs.

Is there a better solution for this?

Thanks :)

Thats the same solution i did, and thats THE solution so far. Until someone fixes it. Lets just hope someone does not abuse this bug again XD

pai911 commented 1 week ago

encountered the same issue here. Does it help to upgrade the https://github.com/MarshalX/python-libipld in use?

MarshalX commented 1 week ago

@Li-WeiCheng hi, no better solution yet :( @pai911 we need to fix python-libipld first

MarshalX commented 1 week ago

I finished with fix. Gonna release it soon https://github.com/MarshalX/python-libipld/pull/51