awslabs / aws-sdk-rust

AWS SDK for the Rust Programming Language
https://awslabs.github.io/aws-sdk-rust/
Apache License 2.0
2.96k stars 242 forks source link

Compile time #113

Open tschuett opened 3 years ago

tschuett commented 3 years ago

I just tried the aws-sdk-ec2 in my workspace. rusoto_ec2 was always the second to last crate and took forever (66s) to build. aws-sdk-ec2 seems to be even worse.

>  cargo +nightly build -Z timings

aws-sdk-ec2 v0.0.8-alpha | 95.8s
rusoto_ec2 v0.46.0 | 66.5s

Is compile time something you are going to consider in the future?

rcoh commented 3 years ago

yeah this is definitely on our radar. Thanks for flagging it.

Other folks: please 👍🏻 this issue to prioritize compile times if they become an issue for you.

tschuett commented 3 years ago

You could add a demo example that depends on all SDKs for compile times measurement.

jdisanti commented 3 years ago

We kind of get that right now through the CI in smithy-rs where one of the actions compiles every single SDK as a single Cargo workspace. For example, the cargo test AWS SDK step, which currently shows 14 minutes for all of the supported services. This data gets muddied every time we add a new service though. It would be nice to have it split out by service so that we could, for example, compare EC2's compile time across builds. It would also be nice to track with cargo llvm-lines for a measurement that is less impacted by differences in the CI hosts.

webern commented 2 years ago

This is also my biggest complaint in using the SDK. One idea that a team member here had is that EC2 functionality could be opted-into using Cargo features. Often you will be using only a few calls from the API, but you have to compile all of it.

Another idea would be to use dynamically dispatched trait objects instead of trait bound generics. I don't know how much that wouldn't help, but I mention it because Rust for Rustaceans mentions faster compile times as a benefit of using dynamic dispatch.

Velfi commented 2 years ago

I've investigated this a teeny bit and at least part of the problem seems to stem from the fact that EC2 has a significant number (118) of paginated APIs. The paginators return impl tokio_stream::Stream which means we're doing a good amount of monomorphization. We'll investigate feature gating pagination or switching to dynamic dispatch.

blaenk commented 1 year ago

For my purposes I actually switched off of using this SDK to instead using a thin wrapper around the AWS CLI and shaved off many many minutes of build time, especially when you consider the separate rebuilds involved for e.g. tests/clippy/release.

Obviously this it not an option or ideal in most cases where the SDK is used, I'm simply sharing the difference it made for me.

I would love to use the SDK instead but for my light purposes it wasn't worth the cost of many minutes of build time.

I sincerely believe that it would be a great idea to get in touch with the Rust compiler optimization team about the compile times incurred by these crates because I think it would serve as a great case study and benchmark on the impact of e.g. monomorphization and optimizing that would benefit the broader Rust ecosystem.

rukai commented 1 year ago

We'll investigate feature gating pagination or switching to dynamic dispatch

Did anything come of these investigations?

For a short term fix I am considering forking the generated ec2 crate and removing all the code that is unused by my project. Edit: i attempted stripping out the bits I didnt need but I've given up because of how entangled the generated code is.

johnm commented 9 months ago

I think we were all expecting at least some progress on this before this crate went 1.0. :-(

Any plans to event start addressing this issue?

webern commented 9 months ago

i attempted stripping out the bits I didnt need but I've given up because of how entangled the generated code is.

I think what is needed is for the code generation to include a bunch of feature gates on EC2 API functionality so that we can enable the things we need in Cargo.toml.

gahooa commented 9 months ago

Compile times of an async cli application went from 5 to 15 seconds just by referencing aws-sdk-ec2, and calling 2 different areas of the functionality.

I also was expecting some progress on this before Stable release, please address this asap.

Jack-Kilrain commented 7 months ago

Just to chime in here on the build times, I'm looking into build failures (well, more like Jenkins dying during builds) and noticed a couple of things that seem to line up with the behaviour here in more details.

I've been tinkering with more verbose compiler logging/tracing for each stage and noticed some pretty interesting results for the AWS SDKs in particular that are miles more expensive that any other crate in shotover. The important results here are as follows.

[2024-01-09T01:45:01.991Z]    Compiling aws-sdk-ec2 v1.3.0
[2024-01-09T01:45:01.991Z] time:   0.001; rss:   41MB ->   44MB (   +3MB)   parse_crate
[2024-01-09T01:45:01.991Z] time:   0.000; rss:   44MB ->   49MB (   +4MB)   crate_injection
[2024-01-09T01:45:11.935Z] time:   9.141; rss:   49MB -> 1016MB ( +967MB)   expand_crate
[2024-01-09T01:45:11.935Z] time:   9.142; rss:   49MB -> 1016MB ( +967MB)   macro_expand_crate
[2024-01-09T01:45:11.935Z] time:   0.258; rss: 1016MB -> 1015MB (   -1MB)   AST_validation
[2024-01-09T01:45:11.935Z] time:   0.027; rss: 1015MB -> 1018MB (   +4MB)   finalize_imports
[2024-01-09T01:45:11.935Z] time:   0.108; rss: 1018MB -> 1018MB (   +0MB)   compute_effective_visibilities
[2024-01-09T01:45:11.935Z] time:   0.116; rss: 1018MB -> 1019MB (   +1MB)   finalize_macro_resolutions
[2024-01-09T01:45:15.203Z] time:   3.400; rss: 1019MB -> 1282MB ( +264MB)   late_resolve_crate
[2024-01-09T01:45:15.203Z] time:   0.163; rss: 1282MB -> 1284MB (   +1MB)   resolve_check_unused
[2024-01-09T01:45:15.459Z] time:   0.297; rss: 1284MB -> 1284MB (   +0MB)   resolve_postprocess
[2024-01-09T01:45:15.459Z] time:   4.117; rss: 1015MB -> 1284MB ( +269MB)   resolve_crate
[2024-01-09T01:45:15.714Z] time:   0.163; rss: 1284MB -> 1282MB (   -2MB)   write_dep_info
[2024-01-09T01:45:15.970Z] time:   0.165; rss: 1282MB -> 1282MB (   +0MB)   complete_gated_feature_checking
[2024-01-09T01:45:24.052Z] time:   0.497; rss: 1940MB -> 1901MB (  -39MB)   drop_ast
[2024-01-09T01:45:24.052Z] time:   8.104; rss: 1282MB -> 1746MB ( +464MB)   looking_for_derive_registrar
[2024-01-09T01:45:24.979Z] time:   9.303; rss: 1282MB -> 1743MB ( +461MB)   misc_checking_1
[2024-01-09T01:45:27.493Z] time:   2.087; rss: 1743MB -> 1823MB (  +80MB)   type_collecting
[2024-01-09T01:45:27.493Z] time:   0.490; rss: 1823MB -> 1819MB (   -4MB)   coherence_checking
[2024-01-09T01:45:42.328Z] time:  13.207; rss: 1819MB -> 1953MB ( +135MB)   wf_checking
[2024-01-09T01:46:00.366Z] time:   4.718; rss:  813MB -> 1577MB ( +764MB)   codegen_to_LLVM_IR
[2024-01-09T01:46:00.366Z] time: 109.242; rss: 1286MB -> 1577MB ( +291MB)   LLVM_passes
[2024-01-09T01:46:00.366Z] time: 117.170; rss:  637MB -> 1577MB ( +940MB)   codegen_crate
[2024-01-09T01:46:00.366Z] time:   0.187; rss: 1577MB ->  939MB ( -638MB)   free_global_ctxt
[2024-01-09T01:46:00.366Z] time:   0.133; rss:  939MB ->  944MB (   +5MB)   link_rlib
[2024-01-09T01:46:00.366Z] time:   0.141; rss:  939MB ->  944MB (   +5MB)   link_binary
[2024-01-09T01:46:00.366Z] time:   0.142; rss:  939MB ->  937MB (   -2MB)   link_crate
[2024-01-09T01:46:00.366Z] time:   0.143; rss:  939MB ->  937MB (   -2MB)   link
[2024-01-09T01:46:00.366Z] time: 132.034; rss:   27MB ->  143MB ( +116MB)   total
[2024-01-09T01:46:00.366Z]    Compiling aws-sdk-iam v1.3.0
[2024-01-09T01:46:00.366Z] time:   0.001; rss:   39MB ->   43MB (   +3MB)   parse_crate
[2024-01-09T01:46:01.293Z] time:  36.337; rss: 1743MB -> 2523MB ( +780MB)   type_check_crate
[2024-01-09T01:46:02.657Z] time:   2.174; rss:   43MB ->  289MB ( +247MB)   expand_crate
[2024-01-09T01:46:02.658Z] time:   2.175; rss:   43MB ->  289MB ( +247MB)   macro_expand_crate
[2024-01-09T01:46:02.658Z] time:   0.059; rss:  289MB ->  289MB (   +0MB)   AST_validation
[2024-01-09T01:46:02.658Z] time:   0.006; rss:  289MB ->  290MB (   +0MB)   finalize_imports
[2024-01-09T01:46:02.658Z] time:   0.015; rss:  290MB ->  290MB (   +0MB)   compute_effective_visibilities
[2024-01-09T01:46:02.658Z] time:   0.021; rss:  290MB ->  291MB (   +1MB)   finalize_macro_resolutions
[2024-01-09T01:46:03.219Z] time:   0.649; rss:  291MB ->  361MB (  +71MB)   late_resolve_crate
[2024-01-09T01:46:03.219Z] time:   0.041; rss:  361MB ->  362MB (   +0MB)   resolve_check_unused
[2024-01-09T01:46:03.475Z] time:   0.081; rss:  362MB ->  362MB (   +0MB)   resolve_postprocess
[2024-01-09T01:46:03.475Z] time:   0.816; rss:  289MB ->  362MB (  +72MB)   resolve_crate
[2024-01-09T01:46:03.475Z] time:   0.044; rss:  362MB ->  362MB (   +0MB)   write_dep_info
[2024-01-09T01:46:03.475Z] time:   0.046; rss:  362MB ->  362MB (   +0MB)   complete_gated_feature_checking
[2024-01-09T01:46:04.839Z] time:   0.107; rss:  493MB ->  494MB (   +0MB)   drop_ast
[2024-01-09T01:46:05.095Z] time:   1.554; rss:  362MB ->  460MB (  +98MB)   looking_for_derive_registrar
[2024-01-09T01:46:05.351Z] time:   1.807; rss:  362MB ->  463MB ( +101MB)   misc_checking_1
[2024-01-09T01:46:05.607Z] time:   0.386; rss:  463MB ->  502MB (  +39MB)   type_collecting
[2024-01-09T01:46:05.862Z] time:   0.113; rss:  502MB ->  515MB (  +13MB)   coherence_checking
[2024-01-09T01:46:09.129Z] time:   2.793; rss:  515MB ->  599MB (  +84MB)   wf_checking
[2024-01-09T01:46:13.294Z] time:   7.677; rss:  463MB ->  686MB ( +223MB)   type_check_crate
[2024-01-09T01:46:19.830Z] time:   6.148; rss:  686MB ->  886MB ( +200MB)   MIR_borrow_checking
[2024-01-09T01:46:21.736Z] time:   2.683; rss:  886MB ->  956MB (  +71MB)   MIR_effect_checking
[2024-01-09T01:46:23.102Z] time:   0.354; rss:  962MB ->  967MB (   +5MB)   module_lints
[2024-01-09T01:46:23.102Z] time:   0.355; rss:  962MB ->  967MB (   +5MB)   lint_checking
[2024-01-09T01:46:23.358Z] time:   0.362; rss:  967MB ->  968MB (   +1MB)   privacy_checking_modules
[2024-01-09T01:46:23.358Z] time:   1.166; rss:  957MB ->  968MB (  +12MB)   misc_checking_3
[2024-01-09T01:46:27.526Z] time:   4.251; rss:  968MB -> 1090MB ( +122MB)   generate_crate_metadata
[2024-01-09T01:46:27.526Z] time:   0.017; rss: 1090MB -> 1090MB (   +0MB)   monomorphization_collector_root_collections
[2024-01-09T01:46:29.418Z] time:   1.944; rss: 1090MB -> 1146MB (  +55MB)   monomorphization_collector_graph_walk
[2024-01-09T01:46:31.307Z] time:   1.705; rss: 1146MB -> 1182MB (  +36MB)   partition_and_assert_distinct_symbols
[2024-01-09T01:46:31.563Z] time:  30.141; rss: 2523MB -> 3476MB ( +953MB)   MIR_borrow_checking
[2024-01-09T01:46:43.763Z] time:  12.219; rss: 3476MB -> 3761MB ( +285MB)   MIR_effect_checking
[2024-01-09T01:46:50.315Z] time:   1.548; rss: 3776MB -> 3782MB (   +6MB)   module_lints
[2024-01-09T01:46:50.315Z] time:   1.549; rss: 3776MB -> 3782MB (   +6MB)   lint_checking
[2024-01-09T01:46:50.877Z] time:   1.663; rss: 3782MB -> 3797MB (  +15MB)   privacy_checking_modules
[2024-01-09T01:46:50.877Z] time:   5.817; rss: 3761MB -> 3797MB (  +36MB)   misc_checking_3
[2024-01-09T01:47:12.764Z] time:  21.787; rss: 3797MB -> 4225MB ( +428MB)   generate_crate_metadata
[2024-01-09T01:47:12.764Z] time:   0.107; rss: 4225MB -> 4225MB (   +0MB)   monomorphization_collector_root_collections
[2024-01-09T01:47:22.707Z] time:   8.825; rss: 4225MB -> 4436MB ( +211MB)   monomorphization_collector_graph_walk
[2024-01-09T01:47:54.736Z] time:  28.788; rss: 4436MB -> 4534MB (  +98MB)   partition_and_assert_distinct_symbols
[2024-01-09T01:50:59.545Z] Cannot contact i-0c01329bc530749c2: java.lang.InterruptedException // Boom - Jenkins agent got nuked by linux watchdog (as well as journald, this build and a few others)
[2024-01-09T01:58:55.094Z] Could not connect to i-0c01329bc530749c2 to send interrupt signal to process

Full log is here: build.log

Crate Highlights

I'd like to note that I'm aware that RSS based metrics for memory usage aren't the most accurate, but they are decent indicators. To that end, there are some seriously big numbers here.

AWS IAM

Most notably in the IAM crate, with type checking and borrow checking. The first round is ok, but the second round of it is what concerns me the the most.

time:  36.337; rss: 1743MB -> 2523MB ( +780MB)  type_check_crate
time:   6.148; rss:  686MB ->  886MB ( +200MB)  MIR_borrow_checking
time:   7.677; rss:  463MB ->  686MB ( +223MB)  type_check_crate
time:  30.141; rss: 2523MB -> 3476MB ( +953MB)  MIR_borrow_checking

This crate doesn't even finish compiling, and by the time the stage after partition_and_assert_distinct_symbols (whatever it is, since it hasn't been logged yet) has been underway, the memory usage goes from 4436MB to 7900MB+.

image

AWS EC2

Similar behaviour in the EC2 crate as well, but it seems to be more centred around translation to IR and codegen. It seems the EC2 crate has a smaller footprint in terms of checkable usage for contexts passed around than IAM though.

[2024-01-09T01:45:11.935Z] time:   9.141; rss:   49MB -> 1016MB ( +967MB)   expand_crate
[2024-01-09T01:45:11.935Z] time:   9.142; rss:   49MB -> 1016MB ( +967MB)   macro_expand_crate
[2024-01-09T01:46:00.366Z] time:   4.718; rss:  813MB -> 1577MB ( +764MB)   codegen_to_LLVM_IR
[2024-01-09T01:46:00.366Z] time: 117.170; rss:  637MB -> 1577MB ( +940MB)   codegen_crate

Crate Structure Considerations

Immediately, this rings a bell to me as poorly structured code that relies on insane amounts of context passed around, or transitive overuse of contexts between function calls. Another avenue is async lifetimes dominating the context transfers on mutable objects.

Looking at the IAM crate in particular, first thing is that the crate seems match that, from what I can see it relies on async state passed around through invocations of various kinds. Within those methods, transitive state can be constructed or copied to be used in other invocations for the IAM API. The bit that also adds to that by the example, everything is within an async context and most operations are build from the client context itself.

#[::tokio::main]
async fn main() -> Result<(), iam::Error> {
    let config = aws_config::load_from_env().await;
    let client = aws_sdk_iam::Client::new(&config);
    // ... make some calls with the client
    Ok(())
}

I feel like this presents a nightmare situation for the borrow checker with production-level usage of this crate. Could be wrong, but the compilation data seems to indicate something in this realm.

I haven't looked too far into the EC2 crate characteristics though

rcoh commented 7 months ago

The crates are all fundamentally identical in terms of structure—these are all interesting statistics. We're planning on spending time this year to see if there any quick wins that are achievable.

We have some inherent challenges—EC2 has literally hundreds of APIs. We need to generate all of them. But some things can probably be improved.

gahooa commented 7 months ago

Immediately, this rings a bell to me as poorly structured code ...

Well said. I have run out of memory on 16GB laptop just trying to compile something that used one EC2 API call. It's unacceptable.

AWS Developers, thank you for the efforts to make this usable. However, your team really needs to address this even if it means releasing v2.0.0 of the crates with an entirely different approach.

rcoh commented 7 months ago

I totally agree—especially about memory. The compile time itself I expect to be improved by the upcoming rustc parallel frontend, but memory is very difficult to work around and can make compiling the Rust SDK in CI prohibitively expensive. We're prioritizing investigating this and will keep folks posted.

We're also very cognizant of the fact that without decisive action, this issue is only going to get worse as more and more APIs are added to these existing crates.

I think, however, a lot of this comes from a very fundamental limitation: EC2 (edit: and other large crates like S3, SQS, IAM etc.) currently have hundreds of operations (EC2 has 615 as of today). They each need their own serializers and deserializers, inner types, etc.—unavoidably an absurdly large amount of code. There are definitely things we can do to shrink how much code we generate; I expect that these will yield improvements in the low double digit percentages. This is helpful, but I think ultimately an EC2 that needs 12GB of memory and compiles in 1m30s is not that much better than 16GB compiling in 2m assuming the rest of your build only needs 1GB and 10s.

We're concurrently investigating ways that we can allow customers to compile only parts of the EC2 crate. These are unfortunately also limited because it was recently discovered that features do not scale well on crates.io, which would prevent us from doing things like feature-per-operation.

Another last ditch item that we're floating is a way to generate ad-hoc SDKs with only a subset of operations included. This is obviously non-ideal for a number of reasons, but we agree that's it's incredibly frustrating to run out of memory on your build because you want to make one HTTP request to start an instance.

In any case, bear with us, this is among our top priorities to improve for this quarter.

nkconnor commented 7 months ago

It seems there is a big emphasis on EC2 here but I am curious if the other SDKs are being included as well. I am experimenting with aws-sdk-s3 and aws-sdk-sqs in an established project and they are considerably longer to build than the other 500 dependencies we have.

rcoh commented 7 months ago

that's a good point—I updated the note. EC2 is the biggest, but there are a lot that are close behind.

chinedufn commented 7 months ago

These are unfortunately also limited because it was recently discovered that features do not scale well on crates.io, which would prevent us from doing things like feature-per-operation.

What do you mean by "does not scale well on crates.io"? Is there a source for this that you could link to? I couldn't find anything after about 2 mins or so of searching.

web-sys has over 1500 different features and I haven't noticed anything as a user. I never use more than a couple dozen feature flags, however. Or do you mean that having hundreds of feature flags in a single crate causes problem for crates.io's internal processes?

jdisanti commented 7 months ago

There was a Rust blog post on the feature scalability issue. Although, 23,000 is quite a bit larger than 600 😄

This is the important bit that prevents us from taking this approach though:

Now comes the important part: on 2023-10-16 the crates.io team deployed a change limiting the number of features a crate can have to 300 for any new crates/versions being published.

xxchan commented 7 months ago

We are aware of a couple of crates that also have legitimate reasons for having more than 300 features, and we have granted them appropriate exceptions to this rule, but we would like to ask everyone to be mindful of these limitations of our current systems.

The limitation can be relaxed for a project. aws-sdk-ec2 should also belong to "have legitimate reasons" (I'm not sure, maybe @Turbo87 can comment this?), and 600 is also much smaller than 1500 of web-sys.

We also invite everyone to participate in finding solutions to the above problems.


Therefore, I think it's still worth considering the approach, because IMHO it's quite straightforward and improves the situation a lot immediately.

Maybe not at the first resort, but can give it a try if other ways don't work or require too much effort.

gahooa commented 7 months ago

Another last ditch item that we're floating is a way to generate ad-hoc SDKs with only a subset of operations included. This is obviously non-ideal for a number of reasons, but we agree that's it's incredibly frustrating to run out of memory on your build because you want to make one HTTP request to start an instance.

My understanding of the aws-* crates currently is they are auto-generated based on an API definition. Would it be crazy to create 1 additional aws crate, aws-sdk-builder which you use in this fashion?

Cargo.toml

[build-dependencies]
aws-sdk-builder = "*"

build.rs

fn main() {
    aws_sdk_builder::generate_rust_file(
        "src/my_aws_api.rs", 
        vec![
            "ec2/run_instances", 
            "ec2/list_instances", 
            "s3/get_object",
        ]
    );
}

Into src/my_aws_api.rs would be placed an exact set of functions and generated structs required to use the 3 APIs mentioned above. No more, no less.

-- I'd like to suggest this as an optional addition to what we already have, not a replacement. It would 100% solve the issue for a number of folks like me, and allow you to give people an "out" while you work on a more formal solution.

-- Note: The output file should be user-specified:

rukai commented 7 months ago

I thought it worth mentioning that I've ported our usage of aws-sdk-ec2 to shell out to the aws CLI as a workaround until the compile time issues are fixed: https://github.com/shotover/aws-throwaway/pull/41

jeffparsons commented 4 months ago

I think, however, a lot of this comes from a very fundamental limitation: EC2 (edit: and other large crates like S3, SQS, IAM etc.) currently have hundreds of operations (EC2 has 615 as of today). They each need their own serializers and deserializers, inner types, etc.—unavoidably an absurdly large amount of code. There are definitely things we can do to shrink how much code we generate; I expect that these will yield improvements in the low double digit percentages. This is helpful, but I think ultimately an EC2 that needs 12GB of memory and compiles in 1m30s is not that much better than 16GB compiling in 2m assuming the rest of your build only needs 1GB and 10s.

I was having a read of the generated code in aws-sdk-ec2 to get a feeling for whether and how much doing more things dynamically might help. E.g. what if the various request/response types exposed a reflection API (read/write key/values by name paired with embedding the smithy models themselves into the crate), and then there was only one implementation of serialization/deserialization, pagination, etc.? I suppose the reflection code itself could end up costing as much to compile... but maybe not. Is this something that's been considered?

We're concurrently investigating ways that we can allow customers to compile only parts of the EC2 crate. These are unfortunately also limited because it was recently discovered that features do not scale well on crates.io, which would prevent us from doing things like feature-per-operation.

What about features for groups of operations? Is there any kind of internal grouping of operations in the underlying model?

Or otherwise by "commonness of use"? From quickly skimming the operation list, it looked to me that most of the operations I would never need because they cover some use case that I think of as obscure. I wonder how big the subset is of "stuff that most people use a lot"? I.e. would it be useful to have features for:

:man_shrugging:

I'd be keen to know if there are particular avenues that are currently favored / considered most promising by the AWS team.

Thanks!

:bow:

Velfi commented 3 months ago

The Parallel Rustc Working Groups is working on releasing a new parallel compiler front-end this year.

I tested it out on EC2 v1.42.0 and it's a big improvement over the current front-end. I ran the following compiles on my work laptop: a 2021 16" M1 MacBook Pro 32GB.

It looks like they still have many issues to solve before it can be the default, though.

xxchan commented 3 months ago

Parallel frontend won't help if the CPU is already fully occupied, e.g., busy with compiling other crates together with aws-sdk-ec2 (interprocess parallelism). So I think it doesn't make the users of aws-sdk-ec2's lives better :)

jeffparsons commented 3 months ago

Parallel frontend will also not help with memory usage — I'd expect it to make it worse.

For a really meaningful improvement, I think a solution is still needed that allows doing less work, not just trying to do the same amount of work faster. E.g. splitting it up into separate crates, feature flags, etc.

If Rust had better support for dynamic linking (e.g. via the proposed crABI and #[export] RFCs) then I'd be pushing for a solution that allows using pre-built binaries...

gahooa commented 3 months ago

*TLDR: custom wrapper around smithy-rs resulted reduced aws-sdk- compile time to seconds.**

Update: For our team, here us how we solved aws-sdk compile times:

We wrote a rust cli designed build a custom version of aws-sdk-*

The cli program does this:

  1. clones smithy-rs into temporary dir
  2. strips down the json definition files to just the services and apis we use
  3. Calls the smithy-rs build process to generate the rust code
  4. Copies that outputted rust code into a public github repo in our account
  5. Tags it with the aws-sdk version

Finally we updated our project's Cargo.toml to refer to our github version of aws-sdk-*

This works great because smithy-rs is powered from json definitions, and there were no additional changes needed.

Once using our own github version of the aws-sdk-* crates, fresh compile times for our project went from minutes down to seconds, recompile times went from 20 seconds down to 8.

Using the mold linker brought this down to 1.5 seconds.

For us it was worth it because nearly tripling our compile times was affecting the development flow, esp in web dev where you need to iterate more quickly.

Keeping updated with upstream is as simple as running a cli command and updating the version tags in our project's cargo.toml

We considered committing the generated crates to our project directory, but there were two main reasons we elected to use a separate repo instead:

  1. cargo clippy DOES NOT like generated aws-sdk code
  2. It was still 10mb of source code with only four sdk apis enabled.

Looking forward to not needing to do this, but it does work well for now.

vultix commented 1 month ago

@gahooa Would your team be willing to open source your implementation?

gahooa commented 1 month ago

@vultix yes we would. We are working on updating it to account for recent changes in the aws crates (service definitions moved to another repo) which prevented us from upgrading.