awslabs / aws-sdk-rust

AWS SDK for the Rust Programming Language
https://awslabs.github.io/aws-sdk-rust/
Apache License 2.0
2.95k stars 241 forks source link

stack overflow inside smithy #1082

Closed levkk closed 5 months ago

levkk commented 5 months ago

Describe the bug

I've been getting a lot of stack overflow errors inside smithy after upgrading to latest version. I was on 0.24 before. Backtrace below:

Backtrace

Process 85326 stopped
* thread #6, name = 'rocket-worker-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x170633a90)
    frame #0: 0x00000001024ef418 cloud2`aws_sdk_ec2::protocol_serde::shape_instance_network_interface::de_instance_network_interface::hffcba714c9277ac8(decoder=0x000000010458c0bc) at shape_instance_network_interface.rs:3
   1    // Code generated by software.amazon.smithy.rust.codegen.smithy-rs. DO NOT EDIT.
   2    #[allow(clippy::needless_question_mark)]
-> 3    pub fn de_instance_network_interface(
   4        decoder: &mut ::aws_smithy_xml::decode::ScopedDecoder,
   5    ) -> Result<crate::types::InstanceNetworkInterface, ::aws_smithy_xml::decode::XmlDecodeError> {
   6        #[allow(unused_mut)]
   7        let mut builder = crate::types::InstanceNetworkInterface::builder();
(lldb) bt
* thread #6, name = 'rocket-worker-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x170633a90)
  * frame #0: 0x00000001024ef418 cloud2`aws_sdk_ec2::protocol_serde::shape_instance_network_interface::de_instance_network_interface::hffcba714c9277ac8(decoder=0x000000010458c0bc) at shape_instance_network_interface.rs:3
    frame #1: 0x0000000102578b9c cloud2`aws_sdk_ec2::protocol_serde::shape_instance_network_interface_list::de_instance_network_interface_list::hdf51b289bad7cf29(decoder=0x000000017063db68) at shape_instance_network_interface_list.rs:10:21
    frame #2: 0x0000000102507808 cloud2`aws_sdk_ec2::protocol_serde::shape_instance::de_instance::h73a7e77f7e15d636(decoder=0x0000000170677e60) at shape_instance.rs:391:25
    frame #3: 0x00000001025758ac cloud2`aws_sdk_ec2::protocol_serde::shape_instance_list::de_instance_list::ha7b51db454d460e8(decoder=0x00000001706799f0) at shape_instance_list.rs:10:21
    frame #4: 0x0000000102497590 cloud2`aws_sdk_ec2::protocol_serde::shape_run_instances::de_run_instances::h0addf5d58c0bb4a6(inp=(data_ptr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<RunInstancesResponse xmlns=\"http://ec2.amazonaws.com/doc/2016-11-15/\">\n    <requestId>2b2f4091-aed7-4173-a0d7-5e3c49e89e19</requestId>\n    <reservationId>r-0fa1cfca07d5b8d92</reservationId>\n    <ownerId>315646385872</ownerId>\n    <groupSet/>\n    <instancesSet>\n        <item>\n            <instanceId>i-0802021f8ceac8b76</instanceId>\n            <imageId>ami-0044b19e690d5dcbf</imageId>\n            <instanceState>\n                <code>16</code>\n                <name>running</name>\n            </instanceState>\n            <privateDnsName>ip-10-0-0-129.us-west-2.compute.internal</privateDnsName>\n            <dnsName>ec2-52-43-67-232.us-west-2.compute.amazonaws.com</dnsName>\n            <reason></reason>\n            <keyName>root2023</keyName>\n            <amiLaunchIndex>0</amiLaunchIndex>\n            <productCodes/>\n            <instanceType>c5.large</instanceType>\n            <launchTime>2024-02-28T21:26:25.000Z</launchTime>\n            <placement>\n                <availabi"..., length = 10144), builder=RunInstancesOutputBuilder @ 0x000000017067abc8) at shape_run_instances.rs:64:25
    frame #5: 0x0000000102496fd0 cloud2`aws_sdk_ec2::protocol_serde::shape_run_instances::de_run_instances_http_response::h0b379e0ef240528e(_response_status=200, _response_headers=0x00000001218214e8, _response_body=(data_ptr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<RunInstancesResponse xmlns=\"http://ec2.amazonaws.com/doc/2016-11-15/\">\n    <requestId>2b2f4091-aed7-4173-a0d7-5e3c49e89e19</requestId>\n    <reservationId>r-0fa1cfca07d5b8d92</reservationId>\n    <ownerId>315646385872</ownerId>\n    <groupSet/>\n    <instancesSet>\n        <item>\n            <instanceId>i-0802021f8ceac8b76</instanceId>\n            <imageId>ami-0044b19e690d5dcbf</imageId>\n            <instanceState>\n                <code>16</code>\n                <name>running</name>\n            </instanceState>\n            <privateDnsName>ip-10-0-0-129.us-west-2.compute.internal</privateDnsName>\n            <dnsName>ec2-52-43-67-232.us-west-2.compute.amazonaws.com</dnsName>\n            <reason></reason>\n            <keyName>my_root_key</keyName>\n            <amiLaunchIndex>0</amiLaunchIndex>\n            <productCodes/>\n            <instanceType>c5.large</instanceType>\n            <launchTime>2024-02-28T21:26:25.000Z</launchTime>\n            <placement>\n                <availabi"..., length = 10144)) at shape_run_instances.rs:25:18
    frame #6: 0x00000001024c50b4 cloud2`_$LT$aws_sdk_ec2..operation..run_instances..RunInstancesResponseDeserializer$u20$as$u20$aws_smithy_runtime_api..client..ser_de..DeserializeResponse$GT$::deserialize_nonstreaming::h2fa2077915681c8a(self=0x000060000002b3c0, response=0x00000001218214e8) at run_instances.rs:159:13
    frame #7: 0x0000000102886c14 cloud2`_$LT$aws_smithy_runtime_api..client..ser_de..SharedResponseDeserializer$u20$as$u20$aws_smithy_runtime_api..client..ser_de..DeserializeResponse$GT$::deserialize_nonstreaming::h02f577259154fe5a(self=0x0000600000229d68, response=0x00000001218214e8) at ser_de.rs:95:9
    frame #8: 0x0000000102525e88 cloud2`aws_smithy_runtime::client::orchestrator::try_attempt::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hf51e98e730639ed7((null)=<unavailable>) at orchestrator.rs:422:21
    frame #9: 0x000000010234a51c cloud2`core::result::Result$LT$T$C$E$GT$::and_then::ha89c66ee6fb03660(self=Result<(), aws_smithy_runtime_api::client::orchestrator::OrchestratorError<aws_smithy_runtime_api::client::interceptors::context::Error>> @ 0x000000017067bc88, op={closure_env#0} @ 0x000000017067c1f0) at result.rs:1320:22
    frame #10: 0x00000001025236b8 cloud2`aws_smithy_runtime::client::orchestrator::try_attempt::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h4bde04726b01e053((null)=0x00000001708354d8) at orchestrator.rs:415:21
    frame #11: 0x0000000102435aa0 cloud2`_$LT$tracing..instrument..Instrumented$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h1661c4bb3d985206(self=Pin<&mut tracing::instrument::Instrumented<aws_smithy_runtime::client::orchestrator::try_attempt::{async_fn#0}::{async_block#0}::{async_block_env#0}>> @ 0x000000017067c630, cx=0x00000001708354d8) at instrument.rs:321:9
    frame #12: 0x0000000102520964 cloud2`aws_smithy_runtime::client::orchestrator::try_attempt::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h3892d8c3f22c8e80((null)=0x00000001708354d8) at orchestrator.rs:427:6
    frame #13: 0x00000001025159f8 cloud2`aws_smithy_runtime::client::orchestrator::try_attempt::_$u7b$$u7b$closure$u7d$$u7d$::h1573feb5fdcae9be((null)=0x00000001708354d8) at orchestrator.rs:343:1
    frame #14: 0x000000010253bb20 cloud2`aws_smithy_runtime::client::orchestrator::try_op::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h51e5e99b9ff3d4ad((null)=0x00000001708354d8) at orchestrator.rs:307:67
    frame #15: 0x0000000102512514 cloud2`_$LT$aws_smithy_runtime..client..timeout..MaybeTimeoutFuture$LT$InnerFuture$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h996539aa2707b24d(self=(pointer = 0x0000000121821750), cx=0x00000001708354d8) at timeout.rs:82:68
    frame #16: 0x000000010253463c cloud2`aws_smithy_runtime::client::orchestrator::try_op::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h0531926973f03229((null)=0x00000001708354d8) at orchestrator.rs:312:10
    frame #17: 0x000000010252b3b4 cloud2`aws_smithy_runtime::client::orchestrator::try_op::_$u7b$$u7b$closure$u7d$$u7d$::h1e7db0cae444c3d9((null)=0x00000001708354d8) at orchestrator.rs:210:1
    frame #18: 0x000000010252a8bc cloud2`aws_smithy_runtime::client::orchestrator::invoke_with_stop_point::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hfdcfafedaf575e5a((null)=0x00000001708354d8) at orchestrator.rs:159:72
    frame #19: 0x00000001025122ec cloud2`_$LT$aws_smithy_runtime..client..timeout..MaybeTimeoutFuture$LT$InnerFuture$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h66e9e475f8cf54f6(self=(pointer = 0x0000000121821230), cx=0x00000001708354d8) at timeout.rs:82:68
    frame #20: 0x000000010252a118 cloud2`aws_smithy_runtime::client::orchestrator::invoke_with_stop_point::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h6597981cafd4c752((null)=0x00000001708354d8) at orchestrator.rs:169:10
    frame #21: 0x0000000102435f84 cloud2`_$LT$tracing..instrument..Instrumented$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h72c9be5779063a7e(self=Pin<&mut tracing::instrument::Instrumented<aws_smithy_runtime::client::orchestrator::invoke_with_stop_point::{async_fn#0}::{async_block_env#0}>> @ 0x0000000170699690, cx=0x00000001708354d8) at instrument.rs:321:9
    frame #22: 0x0000000102528c94 cloud2`aws_smithy_runtime::client::orchestrator::invoke_with_stop_point::_$u7b$$u7b$closure$u7d$$u7d$::h31d7c6d735c06f2f((null)=0x00000001708354d8) at orchestrator.rs:172:6
    frame #23: 0x00000001024c3c00 cloud2`aws_sdk_ec2::operation::run_instances::RunInstances::orchestrate_with_stop_point::_$u7b$$u7b$closure$u7d$$u7d$::h34f325f837ece76e((null)=0x00000001708354d8) at run_instances.rs:53:135
    frame #24: 0x00000001024c34f0 cloud2`aws_sdk_ec2::operation::run_instances::RunInstances::orchestrate::_$u7b$$u7b$closure$u7d$$u7d$::hda2beaf3d5122fbc((null)=0x00000001708354d8) at run_instances.rs:31:14
    frame #25: 0x0000000102550250 cloud2`aws_sdk_ec2::operation::run_instances::builders::RunInstancesFluentBuilder::send::_$u7b$$u7b$closure$u7d$$u7d$::h118a0dea5cdcadd5((null)=0x00000001708354d8) at builders.rs:107:93

Platform

Tested and reproduced on Mac OS (Apple M1 Max, Ventura) and Ubuntu 22.04 (AMD).

Expected Behavior

Not to segfault in a supposedly memory-safe language.

Current Behavior

I'm getting a segfault.

Reproduction Steps

let region_provider = RegionProviderChain::first_try("us-west-2");
let shared_config = aws_config::from_env().region(region_provider).load().await;
Client::new(&shared_config)

// compose a run_instances() call and call send().await?

Possible Solution

No response

Additional Information/Context

No response

Version

cargo tree | grep aws
├── aws-config v1.1.7
│   ├── aws-credential-types v1.1.7
│   │   ├── aws-smithy-async v1.1.7
│   │   ├── aws-smithy-runtime-api v1.1.7
│   │   │   ├── aws-smithy-async v1.1.7 (*)
│   │   │   ├── aws-smithy-types v1.1.7
│   │   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-runtime v1.1.7
│   │   ├── aws-credential-types v1.1.7 (*)
│   │   ├── aws-sigv4 v1.1.7
│   │   │   ├── aws-credential-types v1.1.7 (*)
│   │   │   ├── aws-smithy-eventstream v0.60.4
│   │   │   │   ├── aws-smithy-types v1.1.7 (*)
│   │   │   ├── aws-smithy-http v0.60.6
│   │   │   │   ├── aws-smithy-eventstream v0.60.4 (*)
│   │   │   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   │   │   ├── aws-smithy-types v1.1.7 (*)
│   │   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   │   ├── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-smithy-async v1.1.7 (*)
│   │   ├── aws-smithy-eventstream v0.60.4 (*)
│   │   ├── aws-smithy-http v0.60.6 (*)
│   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   ├── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-types v1.1.7
│   │   │   ├── aws-credential-types v1.1.7 (*)
│   │   │   ├── aws-smithy-async v1.1.7 (*)
│   │   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   │   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-sdk-sso v1.15.0
│   │   ├── aws-credential-types v1.1.7 (*)
│   │   ├── aws-runtime v1.1.7 (*)
│   │   ├── aws-smithy-async v1.1.7 (*)
│   │   ├── aws-smithy-http v0.60.6 (*)
│   │   ├── aws-smithy-json v0.60.6
│   │   │   └── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-smithy-runtime v1.1.7
│   │   │   ├── aws-smithy-async v1.1.7 (*)
│   │   │   ├── aws-smithy-http v0.60.6 (*)
│   │   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   │   ├── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   ├── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-types v1.1.7 (*)
│   ├── aws-sdk-ssooidc v1.15.0
│   │   ├── aws-credential-types v1.1.7 (*)
│   │   ├── aws-runtime v1.1.7 (*)
│   │   ├── aws-smithy-async v1.1.7 (*)
│   │   ├── aws-smithy-http v0.60.6 (*)
│   │   ├── aws-smithy-json v0.60.6 (*)
│   │   ├── aws-smithy-runtime v1.1.7 (*)
│   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   ├── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-types v1.1.7 (*)
│   ├── aws-sdk-sts v1.15.0
│   │   ├── aws-credential-types v1.1.7 (*)
│   │   ├── aws-runtime v1.1.7 (*)
│   │   ├── aws-smithy-async v1.1.7 (*)
│   │   ├── aws-smithy-http v0.60.6 (*)
│   │   ├── aws-smithy-json v0.60.6 (*)
│   │   ├── aws-smithy-query v0.60.6
│   │   │   ├── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-smithy-runtime v1.1.7 (*)
│   │   ├── aws-smithy-runtime-api v1.1.7 (*)
│   │   ├── aws-smithy-types v1.1.7 (*)
│   │   ├── aws-smithy-xml v0.60.6
│   │   ├── aws-types v1.1.7 (*)
│   ├── aws-smithy-async v1.1.7 (*)
│   ├── aws-smithy-http v0.60.6 (*)
│   ├── aws-smithy-json v0.60.6 (*)
│   ├── aws-smithy-runtime v1.1.7 (*)
│   ├── aws-smithy-runtime-api v1.1.7 (*)
│   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-types v1.1.7 (*)
├── aws-credential-types v1.1.7 (*)
├── aws-sdk-autoscaling v1.16.0
│   ├── aws-credential-types v1.1.7 (*)
│   ├── aws-runtime v1.1.7 (*)
│   ├── aws-smithy-async v1.1.7 (*)
│   ├── aws-smithy-http v0.60.6 (*)
│   ├── aws-smithy-json v0.60.6 (*)
│   ├── aws-smithy-query v0.60.6 (*)
│   ├── aws-smithy-runtime v1.1.7 (*)
│   ├── aws-smithy-runtime-api v1.1.7 (*)
│   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-smithy-xml v0.60.6 (*)
│   ├── aws-types v1.1.7 (*)
├── aws-sdk-ec2 v1.22.0
│   ├── aws-credential-types v1.1.7 (*)
│   ├── aws-runtime v1.1.7 (*)
│   ├── aws-smithy-async v1.1.7 (*)
│   ├── aws-smithy-http v0.60.6 (*)
│   ├── aws-smithy-json v0.60.6 (*)
│   ├── aws-smithy-query v0.60.6 (*)
│   ├── aws-smithy-runtime v1.1.7 (*)
│   ├── aws-smithy-runtime-api v1.1.7 (*)
│   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-smithy-xml v0.60.6 (*)
│   ├── aws-types v1.1.7 (*)
├── aws-sdk-route53 v1.16.0
│   ├── aws-credential-types v1.1.7 (*)
│   ├── aws-runtime v1.1.7 (*)
│   ├── aws-smithy-async v1.1.7 (*)
│   ├── aws-smithy-http v0.60.6 (*)
│   ├── aws-smithy-json v0.60.6 (*)
│   ├── aws-smithy-runtime v1.1.7 (*)
│   ├── aws-smithy-runtime-api v1.1.7 (*)
│   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-smithy-xml v0.60.6 (*)
│   ├── aws-types v1.1.7 (*)
├── aws-sdk-s3 v1.17.0
│   ├── aws-credential-types v1.1.7 (*)
│   ├── aws-runtime v1.1.7 (*)
│   ├── aws-sigv4 v1.1.7 (*)
│   ├── aws-smithy-async v1.1.7 (*)
│   ├── aws-smithy-checksums v0.60.6
│   │   ├── aws-smithy-http v0.60.6 (*)
│   │   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-smithy-eventstream v0.60.4 (*)
│   ├── aws-smithy-http v0.60.6 (*)
│   ├── aws-smithy-json v0.60.6 (*)
│   ├── aws-smithy-runtime v1.1.7 (*)
│   ├── aws-smithy-runtime-api v1.1.7 (*)
│   ├── aws-smithy-types v1.1.7 (*)
│   ├── aws-smithy-xml v0.60.6 (*)
│   ├── aws-types v1.1.7 (*)

Environment details (OS name and version, etc.)

Mac OS Ventura M1 (also reproduced on Ubuntu 22.04 AMD)

Logs

aws_smithy_runtime::client::identity::cache::lazy] identity cache miss occurred; added new identity (took 105µs) new_expiration=2024-02-28T21:41:48.709235Z valid_for=899.999799s partition=IdentityCachePartition(0)

thread 'rocket-worker-thread' has overflowed its stack
fatal runtime error: stack overflow
sh: line 1: 70987 Abort trap: 6           cargo run
[Finished running. Exit status: 134]
levkk commented 5 months ago

Downgrading back to 0.24 fixed the segfault.

ysaito1001 commented 5 months ago

Hi, thank you for reporting this. Hmm, it is strange the snippet runs into a segfault with the latest aws_sdk_ec2.

Can you run any of the ec2 examples without the segfault? I'm trying to see whether the segfault is specific to the run_instances operation or it occurs with any operation in aws_sdk_ec2.

rcoh commented 5 months ago

What Rust version is this? Anything else weird about your environment? Are you spawning a thread with a reduced stack size?

25 stack frames is not a lot...

Is it possible to see the size of each frame with GDB?

levkk commented 5 months ago

I'm thinking now I might be a victim of a small stack here. Tokio by default allocates 2MB per thread and my app is getting pretty hefty. Not sure why the SDK upgrade triggered the stack overflow, maybe it added a few more levels of abstraction and that finally tipped the stack size over. In any case, I increased my stack size to 4MB and I'm not seeing this issue anymore.

I'll reopen this issue if the stack size fix doesn't end up working long term.

Thank you.

github-actions[bot] commented 5 months ago

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.