Stack Overflow at `api.get_updates` in example `async_reply_to_message_updates.rs`

johannesvollmer commented 2 years ago

Hi! I tried running the example async_reply_to_message_updates.rs. It crashed with a stack overflow error:

thread 'main' has overflowed its stack
error: process didn't exit successfully: `target\debug\telegram-bot-test.exe` (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)

Process finished with exit code -1073741571 (0xC00000FD)

I was able to pinpoint the overflowing call in the example:

https://github.com/ayrat555/frankenstein/blob/1217d29c3b6c8f9866d0c21cbba42517e62ce52e/examples/async_reply_to_message_updates.rs#L18

Running frankenstein = { version = "0.13.0", features = ["async-http-client"] }, rustc 1.60.0 (7737e0b5c 2022-04-04), Win 10.

I will investigate further. Maybe the error is within the example, or my project setup

johannesvollmer commented 2 years ago

https://github.com/ayrat555/frankenstein/blob/1217d29c3b6c8f9866d0c21cbba42517e62ce52e/src/api/async_telegram_api_impl.rs#L58

The json deserialization seems to be the problem. Stack overflow seems to happen here

johannesvollmer commented 2 years ago

Note: The bot existed and has some updates. It's sending the same json every time. It's a json list of updates, not looking suspicious to me

johannesvollmer commented 2 years ago

This is an excerpt of the response from telegram that crashed the json parser (ids scrambled):

{
    "ok": true,
    "result": [
        {
            "update_id": 77345345072,
            "poll": {
                "id": "5345345261554",
                "question": "What would you like to order, master?",
                "options": [
                    {
                        "text": "The Amici for some unhealthy snack",
                        "voter_count": 1
                    },
                    {
                        "text": "The Beef Club for distinguished Ladies and Gentlemen",
                        "voter_count": 0
                    },
                    {
                        "text": "Kaishi Combo for the extraordinarily exotic",
                        "voter_count": 1
                    }
                ],
                "total_voter_count": 2,
                "is_closed": false,
                "is_anonymous": true,
                "type": "regular",
                "allows_multiple_answers": false
            }
        },
        {
            "update_id": 7343450073,
            "poll": {
                "id": "544757133453451554",
                "question": "What would you like to order, master?",
                "options": [
                    {
                        "text": "The Amici for some unhealthy snack",
                        "voter_count": 1
                    },
                    {
                        "text": "The Beef Club for distinguished Ladies and Gentlemen",
                        "voter_count": 0
                    },
                    {
                        "text": "Kaishi Combo for the extraordinarily exotic",
                        "voter_count": 0
                    }
                ],
                "total_voter_count": 1,
                "is_closed": false,
                "is_anonymous": true,
                "type": "regular",
                "allows_multiple_answers": false
            }
        },
        {
            "update_id": 7734345074,
            "message": {
                "message_id": 34345,
                "from": {
                    "id": 2934534548,
                    "is_bot": false,
                    "first_name": "xxxx"
                },
                "chat": {
                    "id": -13453457001,
                    "title": "bot test group",
                    "type": "supergroup"
                },
                "date": 1634535345,
                "text": "Nice"
            }
        },

    ]
}

and then some more updates. about 500 lines when the json is formatted

johannesvollmer commented 2 years ago

example async_get_me.rs works just fine example reply_to_message_updates.rs also overflows

johannesvollmer commented 2 years ago

It also overflows for the following response (ids scrambled):


{
    "ok": true,
    "result": [
        {
            "update_id": 234523452345,
            "message": {
                "message_id": 234532452345,
                "from": {
                    "id": 23453452435,
                    "is_bot": false,
                    "first_name": "Johannes",
                    "username": "cvyxcvxycv",
                    "language_code": "en"
                },
                "chat": {
                    "id": 234523452435,
                    "first_name": "Johannes",
                    "username": "cvyxcvxycv",
                    "type": "private"
                },
                "date": 234523452345,
                "text": "hi"
            }
        }
    ]
}

ayrat555 commented 2 years ago

@johannesvollmer yes, this issue is happening only on windows. we need to investigate this.

I haven't seen this happening on *nix

cc @EdJoPaTo

johannesvollmer commented 2 years ago

I tried to add the following test to async_telegram_api_impls.rs test module

(ids scrambled)

    #[test]
    fn parse_message_success() {
        let response_string = r#"
        {
            "ok": true,
            "result": [
                {
                    "update_id": 234523452345,
                    "message": {
                        "message_id": 234532452345,
                        "from": {
                            "id": 23453452435,
                            "is_bot": false,
                            "first_name": "XXX",
                            "username": "cvyxcvxycv",
                            "language_code": "en"
                        },
                        "chat": {
                            "id": 234523452435,
                            "first_name": "XXX",
                            "username": "cvyxcvxycv",
                            "type": "private"
                        },
                        "date": 234523452345,
                        "text": "hi"
                    }
                }
            ]
        }
        "#;

        let json_result: MethodResponse<Vec<Update>> = {
            serde_json::from_str(response_string).unwrap()
        };
    }

I cannot even get it to run lol

I got it to run but the test did not fail. Running the example still overflows though, exactly at the serde_json call

johannesvollmer commented 2 years ago

Have you tried creating a bot, sending messages on telegram before starting once, and then running the bot with those expected updates?

EdJoPaTo commented 2 years ago

Maybe #8 is related here as it was a windows only problem too? Would be interesting to find out the cause.

johannesvollmer commented 2 years ago

I wanted to present it to my colleagues tomorrow. Any ideas what I can try next?

Calling serde_json::from_str(message) succeeds within a test, but while running the example fails. The test passes with the same message string as the example, how can this be? Maybe some kind of global state that is different when running in testing mode than in example mode?

This is what I tried:

in https://github.com/ayrat555/frankenstein/blob/1217d29c3b6c8f9866d0c21cbba42517e62ce52e/src/api/telegram_api_impl.rs#L51

                dbg!(&message);
                dbg!(std::any::type_name::<T>()); // gets printed as expected.

                let json_result: Result<T, serde_json::Error> = serde_json::from_str(&message);

                dbg!(json_result.is_ok()); // never gets printed.

johannesvollmer commented 2 years ago

Interesting News! When dividing serde_json::from_str into two separate stages, it works:

Replace

https://github.com/ayrat555/frankenstein/blob/1217d29c3b6c8f9866d0c21cbba42517e62ce52e/src/api/telegram_api_impl.rs#L51

with

let json_result: Result<T, serde_json::Error> = serde_json::Value::from_str(&message).and_then(serde_json::from_value);

Of course, this is a barely a work-around, the performance will be inferior due to the increased number of allocations and two stages

johannesvollmer commented 2 years ago

@EdJoPaTo yes seems like the same problem

johannesvollmer commented 2 years ago

after reading https://github.com/rust-lang/rust/issues/80289, I now have a theory:

stack memory is 8MB on unix by default, but only 1MB on Windows
stack memory is exhausted differently when running tests vs when running full examples
json deserialize function might allocate multiple large objects: note: std::mem::size_of::<Update>() is 30384bytes. This means a single Update instance is 30kb! I presume this type is allocated a lot, maybe even recursively, in the json deserialization function

If this theory turns out to be true, I see three solutions:

library user increases stack size to a larger number, maybe 10MB. can be done immediately without updating the crate. but how could we document that
the library finds a way of reducing stack memory
- use an enum instead of having a ton of optional fields in struct Update and in struct Message (assuming most fields will never be set at the same time)
- use Option<Box<Message>> in struct Update, or box some fields in the Message type (may have performance implications)
serde finds a way to allocate large structs on the heap

johannesvollmer commented 2 years ago

The following observations make me think the stack overflow theory is actually a legitimate: Boxing all Message fields in struct Update made the example run on my device, finally. The type Update, formerly 30kb, now has 9872b, which is about 1kb. Boxing all the fields in Update makes it about 120 bytes.

However, I think making these fields an enum instead of boxing them would be the cleanest solution.

johannesvollmer commented 2 years ago

I have a fork with the quickfix changes, it seems to work for now. If the enum solution turns out to take too long, this could serve as a temporary work-around. What do you think? Any ideas how we can verify the stack overflow theory?

EdJoPaTo commented 2 years ago

That's interesting!

Boxing everything seems like horrible code to me. And the storage is not gone, its just elsewhere (on the heap) shifting the problem. Designing the structs differently to improve their memory usage (whereever it is stored) is the best solution to me. This should solve this problem and results in more effective RAM uses on all platforms.

Using an enum for Update seems like an interesting solution here. Due to it being used a lot changes here should have the most impact overall. Even if that requires custom (de)serialization code. Taking a look at teloxide they did exactly that: https://github.com/teloxide/teloxide-core/blob/master/src/types/update.rs (Their code is MIT licensed).

Not sure if we could just use the teloxide-core create for the types in general :thinking:

johannesvollmer commented 2 years ago

Even if that requires custom (de)serialization code.

I think we might get away without custom de/serialization, using externally tagged enums and struct flattening. That way, we might just do the following change (field renaming ommited for brevity):

#[derive(Serialize, Deserialize)]
struct Update {
  update_id: u32,

  #[serde(flatten)]
  content: UpdateContent,
}

#[derive(Serialize, Deserialize)]
enum UpdateContent {

  #[serde(rename="message")]
  Message(Message),

  EditedMessage(Message),
  Poll(Poll),
  // ...
}

If I'm not mistaken, this should result in the following json:

{
  "update_id": 32,
  "message": {
    "message_id": 32,
    ...
  } 
}

Which is the json we are currently having, isn't it?

johannesvollmer commented 2 years ago

Again, (as I'm unsure how to interpret the official telegram documentation) this will only work if all of these states never occur at the same time, otherwise an enum will not make sense obviously

EdJoPaTo commented 2 years ago

Again, (as I'm unsure how to interpret the official telegram documentation) this will only work if all of these states never occur at the same time, otherwise an enum will not make sense obviously

The official documentation states exactly that: https://core.telegram.org/bots/api#update

johannesvollmer commented 2 years ago

haha didn't find that one. So, any objections about the enum approach? @ayrat555

For memory representation this would be optimal, since the enum will only be about as large as the largest variant, which is probably the Message type. Which would probably also be refactored to contain an enum. So, in the end, memory usage should go down by a lot.

ayrat555 commented 2 years ago

no objections.

@johannesvollmer great investigation skills. good work!

Not sure if we could just use the teloxide-core create for the types in general 🤔

@EdJoPaTo I'd rather keep teloxide and Frankenstein independent.

EdJoPaTo commented 2 years ago

@EdJoPaTo I'd rather keep teloxide and Frankenstein independent.

Their types look useful. Using the bot api types in two projects would simplify a lot of work. And they will be the same for every Telegram Library. But yeah, teloxide-core seems to do more than just the types.

johannesvollmer commented 2 years ago

Also, I noticed that the allowed_updates functionality could be made type safe. Right now, one might pass an invalid string to the function.

Look at teleoxide for inspiration: https://github.com/teloxide/teloxide-core/blob/master/src/types/allowed_update.rs

EdJoPaTo commented 2 years ago

Go ahead and create some pull requests. I will happily review them :)

johannesvollmer commented 2 years ago

In the message documentation, I can't find any statement as to whether some of the fields will be present at the same time. Nevertheless, teloxide has an enum for some of the fields

There's probably no guarantee that two of those fields might be present in the future. Do we want to take the risk?

ayrat555 commented 2 years ago

I think as the first step we can just migrate Update to enum

btw @johannesvollmer are you working on this?

I can try to pick this up this week (most probably on the weekend) if you're not

johannesvollmer commented 2 years ago

had the change done today in the morning, it actually was a task of only 5 minutes, but i feel like some more tests should be added though

see #62

johannesvollmer commented 2 years ago

Merged. Awesome, thanks!

ayrat555 / frankenstein

Stack Overflow at `api.get_updates` in example `async_reply_to_message_updates.rs` #61