braintree / braintree_dotnet

Braintree .NET library
https://developer.paypal.com/braintree/docs/start/overview
MIT License
136 stars 73 forks source link

Improve handling of Unprocessable Entity (422) HTTP errors #121

Open wessleym opened 2 years ago

wessleym commented 2 years ago

The code previously tried to ignore HTTP 422 errors, but the response would not be parsed correctly later on.

Summary

The below narrative is related to the .NET Core code. Line numbers would differ with .NET Framework. I have improved both .NET Core and .NET Framework code in this pull request. When making a call to BraintreeGateway.Subscription.Search(SubscriptionSearchRequest query) with a complex query, I'm getting an System.ArgumentNullException with this message: "Value cannot be null. (Parameter 's')" Here is the stack trace:

at System.Int32.Parse(String s)
at Braintree.ResourceCollection`1..ctor(NodeWrapper response, PagingDelegate nextPage)
at Braintree.SubscriptionGateway.Search(SubscriptionSearchRequest query)

Using the braintree_dotnet source code, I debugged the problem further:

  1. SubscriptionGateway.Search calls BraintreeService.Post on line 105.
  2. BraintreeService.Post calls BraintreeService.GetXmlResponse on line 48.
  3. BraintreeService.GetXmlResponse calls HttpService.GetHttpResponse on line 142. (GetXmlResponse also has an unneeded and detrimental try-catch since it is catching and rethrowing an exception, but I did not change that now since I didn't want to include unnecessary changes with my pull request.)
  4. HttpService.GetHttpResponse calls HttpClient.SendAsync on line 95.
  5. Still in HttpService.GetHttpResponse, response returns with a StatusCode of 422. I contend this should throw an exception so the code can fail fast. But instead the code overlooks 422 status codes on line 96.
  6. After returning from HttpService.GetHttpResponse, BraintreeService.GetXmlResponse has a response of the below value on line 142: <unprocessable-entity><reason>timeout</reason></unprocessable-entity>
  7. BraintreeService.GetXmlResponse calls BraintreeService.StringToXmlNode with the above value, turning the above string in an XmlNode.
  8. This XmlNode is returned to BraintreeService.Post and then to SubscriptionGateway.Search, which passes it into new ResourceCollection<Subscription> on line 107.
  9. ResourceCollection<T>'s constructor calls int.Parse(response.GetString("page-size")) on line 25.
  10. NodeWrapper.GetString returns returns null (since "page-size" doesn't exist in the XML). (It might be better to throw exceptions for non-existent XML values. Again, fail fast.)
  11. int.Parse tries to parse a null value and fails, as in the exception listed at the top of this issue.

Resolution

My suggestion is that the code should not try to overlook a 422 status code in HttpService.GetHttpResponse and HttpService.GetHttpResponseAsync. Instead, I would recommend these methods be written as in my pull request.

Server considerations

Finally, while I think the code changes would be helpful, I'm not certain the server is replying properly. It's replying with a 422 status code (unprocessable entity), but the response body says it was an unprocessable entity because of a timeout. I would think the server would just send a 408 (request timeout) status code, but maybe there is a good reason internally why the 428 status code is more correct.

Checklist

hollabaq86 commented 2 years ago

👋 @wessleym Thanks for this PR and your thorough notes, and apologies for the delay in a response. I agree on your findings re: why this request returned a 422 and not a different, more clear timeout status code.

I know it's been awhile since you opened this PR, but could I trouble you to contact Braintree Support with steps to replicate your subscription search, or if you've encountered this same issue recently, provide them timestamps and your merchant ID so they can find this search in our logs to investigate further?

I do want to find where in our API this is happening so we can forward this to the engineering team responsible to hopefully update the response.

It's totally ok if you've moved on from this PR and don't want to bother reaching out. We'll take the feedback in your PR (and your wonderful notes) and hopefully incorporate this in a future version of the SDK. Thanks again!

wessleym commented 2 years ago

@hollabaq86, I've definitely not moved on, so thank you for trying to help me. I hope that this change can be merged some day, and I have just written Braintree support.

hollabaq86 commented 2 years ago

👋 @wessleym wanted to follow up - I'm in touch with search engineering teams and we found the part of our code where we're incorrectly setting timeouts on the gateway side to a 422. Don't have an ETA on when a resolution from our servers will be deployed, but we've identified what needs to change so that our servers are better handling this situation.

It's our practice to send 422 responses only for validation errors, so we need to do some more investigating on our side to make sure any other Unprocessable Entity responses are getting properly set instead of getting blindly sent back in these searches.

I'm not going to close this PR just yet because I want to weigh with the team whether we should have your work as a fallback. Ultimately the solution to your issue is with our API, in my opinion, not the SDK.

wessleym commented 2 years ago

@hollabaq86, that's interesting that the search engineering teams have found the problem. In my conversations with support, they just keep asking for code to reproduce the problem (even though I've sent it to them three times). So your last message is encouraging. Thank you. I agree that the problem is in the API, but clearly the SDK isn't handling the 422 response correctly either. And as someone who uses the Braintree SDK for multiple clients, I want the SDK to be as robust as possible.

hollabaq86 commented 2 years ago

for internal notekeeping, ticket 1618

wessleym commented 2 years ago

I got a response from Braintree support:

I have gotten all of this information over to our developers and they have created an internal ticket to track this issue. As soon as I have an update from them I will reach back out. For now, I would recommend running that search with fewer subscription ID's in order to make sure you are not receiving a timeout that is given a different code.

The request does work with fewer subscription IDs of course. I just fails with a lot of subscription IDs.