arlyon / async-stripe

Async (and blocking!) Rust bindings for the Stripe API
https://payments.rs
Apache License 2.0
464 stars 130 forks source link

[wip] Revamp codegen #314

Closed mzeitlin11 closed 7 months ago

mzeitlin11 commented 1 year ago

Built(ish) on #311. This initially started as an attempt to better types involved in the codegen, but turned into more of a codegen refactor to handle (almost) all of the OpenAPI spec.

Also, all the ext files are no longer needed!

Left to do in this PR

Open Questions

Handling Breaking Changes

Features / crate-splitting

My hope is that we could define features along the lines of feature x includes objects account, customer, feature y includes component balance, .... Dependency tracking would then enable understanding which features necessarily enable other features to fix #137. With a split into crates, this would make clear what cross-crate deps must need to be.

Hopefully, circular dependencies could be handled by just mandating that some specific items live in some crate stripe-core.

Improve handling of unions

With respect to the compile time pains mentioned above, another solution is to continue exploring usage of something like miniserde to improve compile times. Right now the unions are the main deterrent to trying this since they are not supported. Currently, we used serde(untagged) for these, but I think for most of these unions of object cases we could use the object field as a discriminant which hopefully allows serde to write more efficient (and smaller) implementations with better error messages. On the miniserde front, this would allow a reasonable automated generation of a deserialize impl through a structure like

match object {
   "account" -> deserialize account type,
   "customer" -> deserialize customer type,
}

When writing other unions, we often end up with a structure like

pub enum TrialEnd {
    Now(TrialEndNow),
    Timestamp(Timestamp)
}

pub enum TrialEndNow {
    Now
}

This happens because we only generate simple enums without fields or enums with fields. It would be great to simplify this representation into

pub enum TrialEnd {
    Now,
    Timestamp(Timestamp)
}

which would generate less code (hopefully help with compile time on its own) and make implementing Deserialize for unions easier

The other common union pattern that appears in the OpenAPI spec is a union of something like Account and DeletedAccount (which is returned by deleting account and some fetching methods). This was previously handled by adding a deleted field to top-level components with a Deleted counterpart and hardcoding the return value of a DELETE request as a Deleted<> struct parameterized by an id type. This doesn't seem to match the spec however - an Account should never have a deleted field and the return value of a deleting an account can also be the Account itself, the Deleted<> loses a bunch of information and forces hardcoding the request return value. However the automatically generated representation

pub enum AccountMaybeDeleted {
    Account(Account)
    Deleted(DeletedAccount)
}

is a breaking change. Deserialization could be done easily by using presence of the deleted: true field, but having to match on variants

Naming request functions

Naming requests is now harder since handling everything means fewer assumptions can be made. For now, this PR just names based on the request "operationId" field since that ensures unique names, but it gives pretty verbose / unexplanatory names. Not sure of best solutions here, best ideas I have are to perhaps just have some simple heuristic-based naming for common patterns (create / delete, etc.) then maybe just manually provide name mappings for other requests? (a pain, but there really aren't that many requests).

Handling duplicate generated content (but different name)

The OpenApi schema can only avoid duplicating content by using reference links. Especially in the case of request parameters, this is not done.

For example, the enum specifying webhook enabled events (which is huge) is specified in 2 separate parameters. The content is identical, but since they are not linked in the spec, they are created for each parameter with a different name following the spec. This was a problem as well before, but some cases were explicitly handled by adding a manual ext file and a mapping of field name / object to that specific type name. This works, but doesn't generalize and has the added issue that manually written types can fall out of date easily (for example the ApiVersion enum).

Since this PR introduces strict definitions of the content of types we generate, we can detect duplicates like these. The question becomes

(more questions/edits coming soon :))

arlyon commented 1 year ago

Wow! It's taken me a few days to work through this (and likely will a few more...) but I am really looking forward to getting this in.

databasedav commented 1 year ago

because it's not defined correctly in stripe's own spec, the code generation doesn't correctly define stuff like https://stripe.com/docs/api/prices/create#create_price-currency_options, which should be Option<HashMap<crate::currency::Currency, crate::generated::CurrencyOption>>

edit: also cashapp is missing from the payment method enums, which i also suspect is a problem on stripe's end

edit: sorry about spamming edits, just want to document whatever issues i find; stripe::generated::price::get_prices will throw

error serializing or deserializing a request: data[0].product invalid type: string, expected a sequence at line 18 column 17

because Price.products is expected to be a Vec<crate::generated::Product> when the old type seems correct, i'm not sure if this is a broader issue with Expandable, also passing expand: Some(vec!["data.product".to_string()]) just changes the invalid type to map, since it expects a vec

edit: it does seem to be something generally wrong with the Expandable fields, the same vec problem is present with CheckoutSession.{customer, payment_intent, etc.}

databasedav commented 1 year ago

@mzeitlin11 hi i'm interested in filling out the Expandable generation here, can u give me some high level guidance on what needs to be done?

mzeitlin11 commented 1 year ago

@mzeitlin11 hi i'm interested in filling out the Expandable generation here, can u give me some high level guidance on what needs to be done?

Hi @databasedav, good timing, was just hoping to start getting back to this soon! I need to refamiliarize myself with what's going, rebase, and clean stuff up. Will probably be hard to work off this branch since lots of changes still required that will determine how Expandable (and some related missing pieces) best handled. Will let you know when this ends up in a state where tasks like implementing Expandable can reasonably be split out.

mzeitlin11 commented 1 year ago

Status Update:

Main outstanding TODO's:

Crates / Features

Pagination

Miniserde

Deduplication

arlyon commented 1 year ago

Hey! Thanks for continuing on this. I am back from holidays today :surfer: and plan on addressing all your questions with some ideas.

arlyon commented 1 year ago

Ok! I will have comments about the openapi crate as well soon...

Crates

I am going to find some time to update my code-splitting PR and see if I can apply the same analysis to this PR and see if there are any clean boundaries between modules.

Pagination

hacks around inferring which id fields should be used as cursors

I think this is OK, we have enough hacks in the previous impl so I don't mind leaving this and coming back to it later if necessary since the scope of the 'damage' is probably quite limited.

It seems more natural to implement Paginable for List where T: Object

That makes sense to me. Happy to brainstorm implementations if you want.

Miniserde

Great! Also happy to help prototype on this. My intuition says it will be fairly straight forward to just spit out a manual trait impl for enums.

Dedupe

There are a few interesting parts here, but I think this should be tackled later. Some points to consider:

mzeitlin11 commented 1 year ago

Crates

I am going to find some time to update my code-splitting PR and see if I can apply the same analysis to this PR and see if there are any clean boundaries between modules.

Sounds great! Will push up an initial implementation of this soonish (hopefully), but that will be more about setting up the structure of having the codegen understand different crates and crate deps than finding smart module boundaries. Initial efforts gave some annoying cycles like in your PR, due to cycles from cases like

pub enum BalanceTransactionSource {
...
    IssuingAuthorization(crate::issuing::authorization::Authorization),
    IssuingDispute(crate::issuing::dispute::Dispute),
    IssuingTransaction(crate::issuing::transaction::Transaction),
...and many more
}

combined with

pub struct Authorization {
...
    /// List of balance transactions associated with this authorization.
    pub balance_transactions: Vec<crate::balance_transaction::BalanceTransaction>,
...
}

Without ending up with just a giant core type crate to break the cycle, the other solution I can think of would be to define the shared base module types in some core crate, but keep the requests in separate crates. e.g. in this case BalanceTransactionSource and issuing::authorization::Authorization would live in core, but the issuing related requests would live in their own crate (since most of the code bloat comes from the requests).

The annoyance with that (other than more codegen complexity) is that right now requests are namespaced as

impl issuing::authorization::Authorization {

}

which would have to change if issuing::authorization::Authorization lives in a different crate

EDIT: this kind of approach would also have the benefit of solving the webhook deserialization problem by keeping all the event types in the same crate so webhooks could be enabled without requiring everything else

arlyon commented 1 year ago

Nice. There is of course also the possibility of not splitting at all, since the code size with miniserde is so so much better, but let's give it a shot and see if we can find something that works.

mzeitlin11 commented 1 year ago

Added some code messing around with type deduplication within files. Definitely needs cleanup / docs (as does everything else), but wanted to include a POC since it influences compile time. Need to finish off miniserde stuff to then see what compile time looks like and whether better crate splitting is necessary. (probably requiring an idea like mentioned in the bottom of https://github.com/arlyon/async-stripe/pull/314#issuecomment-1599305824)

Could also introduce granular features to help compile time pretty easily (since we have ideas of dependencies and features could just gate the related mod requests). e.g. "apple_pay_domain" would be a feature to enable just those specific requests. Only concern is that too many granular features makes library less usable since need to look through many features to find what needs to be enabled (and probably need to autogenerate some good feature table for the docs).

Related to https://github.com/arlyon/async-stripe/pull/314#issuecomment-1599305824, cc @arlyon if you have any thoughts about the stripe::Client - I know you've had some ideas about changing things with an MSRV that allows GAT's. Depending on the structure of that, might affect how we want to generate requests. But regardless of that, curious of your thoughts on current pattern of

Account::create(&client, param_struct) vs. more of a builder pattern like CreateAccount::new(required_params).param1(value).param2(value).send(&client), more like how https://github.com/awslabs/aws-sdk-rust does things. Advantages of the former seem to be that it's more consistent with stripe libs in other languages, but the main annoyance is that it forces the Account requests to be defined in the same crate as the Account type signature, making crate splitting harder. Builder pattern also helps with something like https://github.com/arlyon/async-stripe/issues/380

adamchalmers commented 1 year ago

Hi there! My company uses async-stripe in our backend. We're only using a small fraction of the Stripe API. So, I'd be happy to test out migrating breaking changes for a realistic use-case (I suspect there's a fair number of other users who also only use a small part of the API).

IMO, breaking changes are OK as long as there's a good migration guide. I think Clap 4.0 had a great migration guide as did Dropshot 9.0 here -- explaining how to migrate is good, breaking changes have to happen eventually. And especially if there's a good reason to make them -- I think many users would be thrilled by the compile time improvements, even if it requires some manual migrations.

If this PR is in an OK state, I'd be happy to start a WIP branch on my work's backend and report what my migration experience was like.

arlyon commented 1 year ago

@adamchalmers thanks for volunteering to help out. I think the general structure of the codegen is mostly there. We (well, @mzeitlin11) may move some things around internally to try to break the code up better and reduce the size of the compilation units and, as you saw, there is still the open question about which syntax to use for generating the requests.

If you are OK with these changes down the line then go ahead. Some feedback from an integration branch would be great.

Speaking of which @mzeitlin11 do you think it would be handy to schedule a synchronous call soon to hash out some of the remaining items? You can book something here https://usemotion.com/meet/alexander-lyon/meeting

adamchalmers commented 1 year ago

Started work on it. One note so far is that this PR requires time-core <= 0.1.0 due to MSRV.

https://github.com/mzeitlin11/async-stripe/blob/types/async-stripe/Cargo.toml#L110-L111

But the time crate as of version 0.3.16 has required time-core = 0.1.1, so to integrate this branch, I've had to downgrade my project to time 0.3.16 from the latest version (0.3.22) and ensure it stays there.

I'd suggest allowing users to use time-core 0.1.x in the final PR.

mzeitlin11 commented 1 year ago

https://usemotion.com/meet/alexander-lyon/meeting

Booked time for next week!

Crate splitting is now implemented with this initial structure: out_crate

Compile Time Notes

stripe_types is a special case which contains base types like Currency, id generation, and all the core types which must be included to avoid circular dependencies. This approach helps a ton with compilation pipelining, because stripe_types has no dependency on the client, so can start compiling as soon as serde / serde_json are ready. Having no requests in stripe_types adds the complexity that, for example, a type like Account lives in stripe_types, but the requests related to Account live in stripe_connect. This should be able to made opaque to the user, however, by just re-exporting stripe_types::Account in stripe_connect. In theory, stripe_types should never have to be a user dependency at all.

The majority of the code (so largest compile time costs) then lives in the other generated crates since the request-related structs are the majority of what we generate. Since requests don't depend on each other, feature-gating different API resources should be straightforward, without having to worry about incompatible feature sets. Once those request-related features are added, the compile time should shrink toward how long stripe_types takes to compile since most users use only a small bit of the giant API.

TODOS

Open Questions

Miniserde Notes

mzeitlin11 commented 1 year ago

Pagination added back (closes #283, closes #246).

Slightly different organization, now implemented on both List* parameters and an existing List<T>. So instead of

let params = ListCustomer::new();
let stream = Customer::list(&client, &params).await.unwrap().paginate(params).stream(&client);

can do

let stream = ListCustomer::new().paginate().stream(&client);

And can also paginate any List<T> received, e.g.

let account = RetrieveAccount::new().send(&client).await.unwrap().
// We have one page of `external_accounts`, get the rest:
let external_account_stream = account.external_accounts.into_paginator().stream(&client);
mzeitlin11 commented 1 year ago

Added back webhook handling in new crate stripe_webhook. Adding another crate annoying in some ways, but issue with keeping in async-stripe is that dependencies look like stripe_webhook -> generated/* -> async-stripe, so adding webhook handling in async-stripe would create circular dependencies since the client cannot depend on the generated types crates.

Also migrated the examples into standalone example binaries in examples/. Those also cannot live in async-stripe anymore since that crate essentially only contains client functionality now, but knows nothing about specific requests.

This is readyish for review / testing (With caveat that these changes require much more testing / docs before being ready for production. @adamchalmers any feedback on issues / annoyances using for real-world use case would be super helpful).

I'd still plan to make more smaller changes around naming + deduplicating some generated types (one example right now is recognizing shared types between component definitions and requests, e.g. AccountType and CreateAccountType should be deduplicated. And reexporting types from stripe_types should still be done so users don't always have to rely on IDE / docs to figure out whether a type like Account lives in stripe_types or the relevant crate (stripe_connect).

Also if there are any opinions on MSRV - once that's settled can remove the time-core pin (and hopefully other pins) and start getting CI running on these changes.