craftcms / commerce

Fully integrated ecommerce for Craft CMS.
https://craftcms.com/commerce
Other
219 stars 170 forks source link

Item adjustments all being duplicated against the order when loading stripe payment intents #3438

Closed matt-adigital closed 5 months ago

matt-adigital commented 8 months ago

Description

Line item adjustments are all being duplicated, this only happens in the final step when the payment part is loaded, this was never a problem until upgrading from v3 to v4, payment is taken at the correct amount for the original value but the order is then recorded as a partial payment with these extra adjustments remaining in the order details when completed. If you load any other page on the front end then the duplicates are removed, so we added some js to do this for us on the payment page, this then meant we hit a php error and the order was not completed. Payment taken successfully but completion of order failed, screenshot attached. We've had to revert back to v3 due to this issue as we can't have all orders showing as partial payments.

Steps to reproduce

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Compete order for original amount and it is successful but records as a partial payment

\\\\-- OR --

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Load any other frontend page
  4. Check CMS active order details, line adjustments are no longer duplicated
  5. Complete payment and encounter php error message attached below

Additional info

E8140A9F-DBA7-47C7-86C1-8586C7BF0341_1_201_a

linear[bot] commented 6 months ago

PT-1361 Item adjustments all being duplicated against the order when loading stripe payment intents

lukeholder commented 7 months ago

@matt-adigital Could you please let us know if you are using the paymentFormHtml() for the payment form?

I can't seem to reproduce this issue.

Also are you using any special mutex driver like redis?

matt-adigital commented 7 months ago

Hi @lukeholder thanks for responding.

We upgraded our stripe api version within the stripe account and yes we were using the following code for the payment gateway:

{% namespace cart.gateway.handle|commercePaymentFormNamespace %}
    {{ cart.gateway.getPaymentFormHtml(params)|raw }}
{% endnamespace %}

This site is hosted on servd so it is using redis but we've not defined redis as a mutex driver within the app.php config file.

angrybrad commented 7 months ago

@matt-adigital

It turns out that default Redis doesn’t make a reliable globally distributed locking mechanism like we thought for ages without jumping through some hoops. https://redis.io/docs/manual/patterns/distributed-locks/

So much so that in Craft 4.6.0, we swapped the default mutex in Craft back to a new and improved database mutex.

It can manifest itself with race conditions and duplicated data like what you’re seeing.

Servd is currently using Redis, but we’ve been talking to Matt (the owner), and he’s got a plan for a fix he’s working on. You’ll need to reach out to him for timelines and specifics, but it looks like you’re already on 4.6.0, so at least you’ve got that part covered already.

matt-adigital commented 7 months ago

Hi @angrybrad thank you for that explanation, all makes sense! We run monthly updates on this site and try to keep everything up to date where possible. We will hold off on the stripe upgrade for now then and start a dialogue with Matt at Servd for an indication of timeline etc. Cheers!

damonadigital commented 7 months ago

Hi @angrybrad ,

Servd applied the fix yesterday regarding the database mutex. We've tested the stripe upgrade on our staging site (Servd) and unfortunately are still experiencing the duplicate adjustments.

I spoke to Joe at Servd this morning and he confirmed it appears to be using the database mutex. He suggested the issue may not be down to the mutex.

lukeholder commented 7 months ago

@damonadigital an you reliably reproduce it with steps you can share?

damonadigital commented 7 months ago

@lukeholder yes

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Compete order for original amount and it is successful but records as a partial payment

-- OR --

  1. Load payment page on site using stripe gateway selected
  2. Check CMS active order details, line adjustments are all duplicated
  3. Load any other frontend page
  4. Check CMS active order details, line adjustments are no longer duplicated

I am no longer seeing the php error when completing a payment after following the second set of steps.

pixelmachine commented 6 months ago

@lukeholder I'm also experiencing this issue quite severely.

It was happening in Craft + Commerce + Stripe 3 just occasionally so @angrybrad suggested removing Redis and upgrading to C4 to get the improved database mutex. Now it's happening to roughly one in 4 orders.

I can reproduce it like @damonadigital can but not every time.

Here's an example I just made, with the same cart, front and back end, both pages refreshed before taking the screenshot. Once the cart is updated, the additional adjustments will be removed. In this case I just have one duplication but usually there are two.

checkout
nfourtythree commented 6 months ago

HI @pixelmachine & @damonadigital

I have just tried to reproduce this (using the steps stated) with the latest Craft, Commerce and Stripe using the example templates and no matter how much I refresh the payment page I am never seeing duplicated adjustments.

I go back and forth to the payment page, change what is in my cart etc and still only see the expected number of adjustments.

CleanShot 2024-03-19 at 16 19 00@2x

Would either of you be willing to create a fresh install with one product in and set up shipping/taxes in the same way you have it in your project. Then if you can replicate with the example templates, send us the database we might be able to figure out what is happening.

We are obviously keen to get to the bottom of this issue so we can get your projects running smoothly without any problems.

Thank you for all the information, time and feedback you have already put into this.

pixelmachine commented 6 months ago

Hey @nfourtythree seems like the issue is somehow related to the hosting environment, did you try it on Servd?

My live setup is using Linode instances provisioned by Ploi, with a load balancer. Bit of a hassle to setup so testing on Servd seems simpler.

The issue does not occur in my local dev and test environments (DDEV and Fortrabbit).

If you can't replicate it in a similar environment then I'll see about getting you an install. Also I've already discussed this a bit more with Brad via support if you want to check that.

nfourtythree commented 6 months ago

Hi All

I have managed to create a test case where I can reliably reproduce this issue. As expected, it is a case of a race condition, we will look to get a fix for this in place as soon as possible.

This is just a note to keep everyone updated with the situation.

Thanks!

nfourtythree commented 6 months ago

Hi @pixelmachine, @damonadigital & @matt-adigital

We have pushed a branch with fixes in for this issue and there is now a PR for it.

Before we merge the branch it would be great if you could give it a test on your local/staging environments to see if this fixes things for your project.

To try this fix, change your craftcms/commerce requirement in your project's composer.json to:

"require": {
  "craftcms/commerce": "dev-bugfix/race-conditions-duplicate-adjustments#1b2df66d0cc988b25d08211e2db1bad33a0a46ce as 4.5.2",
  "...": "..."
}

Then run composer update.

And then let us know how you get on.

Thanks!

pixelmachine commented 6 months ago

Thanks @nfourtythree testing now.

damonadigital commented 6 months ago

Hi @nfourtythree ,

Thanks for the fix. Unfortunately it's not worked for us, we are still getting duplicate adjustments.

nfourtythree commented 6 months ago

Hi @damonadigital

Thank you for your response, sorry to hear this isn't working.

This feels like an environmental mutex related issue. In the branch I linked to we are now locking the saving of the order on requests to the payment controller pay action. This means (if the mutex lock is working correctly) that if a race condition of requests to that action that the saving (and therefore recalculation) of the order can only happen one at time.

Are you able to reproduce the issue with the following:

Thanks

pixelmachine commented 6 months ago

Hey @nfourtythree I can also confirm that it hasn't fixed the issue for me.

I can only answer your first question at the moment: No I can't replicate the problem in my local or staging environment and never have been able to.

I'll try to test the others.

pixelmachine commented 6 months ago

Hi @nfourtythree so I've done a lot of testing over the last couple of days and I think I've found the culprit at least for my site – Cloudflare.

I'm using Blitz for the non-shop parts of my site and the instructions for Blitz suggest using a page rule in Cloudflare set to 'cache everything'. With a clone of my site on the exact same system I was only able to reproduce the issue if the page rule was enabled.

Other relevant things

I also setup a version of the Spoke & Chain site, again on the same system, but wasn't able to reproduce the problem. On my original site the checkout is a single page and each step is updated using a Sprig component, which in turn may refresh the Stripe payment form.

I'm not sure why this should make any difference and if I get the chance I'll modify the Spoke & Chain site to see if using Sprig to update the cart changes anything. Looking at the requests I can't see why it should but that's really the only difference I can see.

One thing that is quite interesting is that I can see the error happening in real time in the Control Panel, and it occurs almost every single time but then often self-corrects:

  1. I select a shipping method on the checkout (or any cart-update)
  2. Sprig submits it
  3. At the same time I refresh the active cart in the CP and see multiple adjustments
  4. Sprig completes and refreshes the payment form
  5. I refresh the active cart and either still see multiple adjustments or it's fixed and they're gone
  6. The front end always shows the correct information regardless of what's in the control panel

Hope that's all clear!

damonadigital commented 6 months ago

It's worth mentioning that our site also uses Cloudflare and Blitz. Although we have only applied the "Cache everything" page rule to 'domain.co.uk/assets/*'

pixelmachine commented 6 months ago

@damonadigital yeah so I think it's probably still related to Cloudflare but I spoke too soon with the page rules. I could no longer replicate it yesterday but it's still happened at least once today. I may set Cloudflare to full bypass and see if that helps.

nfourtythree commented 6 months ago

Hi All

Thank you for your extra information.

In an attempt to keep things progression we have pushed up another fix which could solve the issue. Although it could solve it I don't believe it is the root cause.

If you have previously changed the composer requirement to point to the branch then you should be able to do a composer update.

With all the extra information provided it seems likely that the the issue here is two requests being made at the same time due to some kind of combination of Sprig, Cloudflare, Blitz etc.

But with this, it is still a little confusing as to why the mutex lock is not preventing the behaviour you are seeing from happening.

I understand everyone is super busy but it would be great if you do get a chance to try out a test controller I have put together.

<?php

namespace modules\controllers;

use craft\web\Controller;
use yii\web\Response;

class MutexController extends Controller
{
    protected array|bool|int $allowAnonymous = true;

    public function actionTest(): Response
    {
        $response = new Response();
        $content = 'Start: ' . floor(microtime(true)) . PHP_EOL;
        if (!\Craft::$app->getMutex()->acquire('test', 10)) {
            $content .= 'Failed: ' . floor(microtime(true)) . PHP_EOL;
        } else {
            $content .= 'Success: ' . floor(microtime(true)) . PHP_EOL;
            sleep(2);
        }

        $response->content = $content;
        \Craft::$app->getMutex()->release('test');

        return $response;
    }
}

With this you can then run the following command line command:

curl -s https://example.domain/actions/my-module/mutex/test & curl -s https://example.domain/actions/my-module/mutex/test; wait

You will need to replace the example.domain and my-module (depending on what you bootstrapped your custom module)

This test will output some micro times to show when the request started and succeeded. If everything is working correctly with the mutex we should see that the start times are the same (within tolerance) and the success times are 2 seconds apart. Here is an example output from my tests:

Start: 1711551857
Success: 1711551857
Start: 1711551857
Success: 1711551859

If there is no difference in success times this would suggest the mutex is not working correctly.

Please let me know how you get on it.

damonadigital commented 6 months ago

Hi @nfourtythree ,

Just a quick update to let you know we've tried the latest fix and still seeing the issue. I will try the test controller when I have a chance and let you know.

Thanks

pixelmachine commented 6 months ago

Interestingly we are no longer seeing the issue with the fix in place, or at least I haven't been able to reproduce it and no live orders have encountered it.

damonadigital commented 6 months ago

I've just done the mutex controller test and these are the results:

Start: 1711637547 Success: 1711637547 Start: 1711637547 Success: 1711637549

nfourtythree commented 6 months ago

Hi @damonadigital

Thank you for getting back to us with that. If the mutex is working and you have pulled the latest changes I am really stumped as to how this issue is still apparent.

Sorry to ask some simple questions:

Thanks

damonadigital commented 6 months ago

Hi @nfourtythree ,

Thanks

pixelmachine commented 6 months ago

Hey @nfourtythree

Staging:

Start: 1712100092
Success: 1712100092
Start: 1712100095
Success: 1712100095

Live:

Start: 1712100349
Success: 1712100349
Start: 1712100349
Success: 1712100351
nfourtythree commented 6 months ago

Hi

Just wanted to check in with how things are going?

@pixelmachine everything still working as expected?

@damonadigital still seeing issues?

Thanks

damonadigital commented 6 months ago

Hi @nfourtythree ,

Yes unfortunately still seeing issues. I did another test after confirming those lines were in the vendor folder.

@pixelmachine Did you end up setting Cloudflare to full bypass in order to fix the issue?

Thanks

pixelmachine commented 6 months ago

@nfourtythree as far as I know we haven't had a single recurrence since your not-a-fix fix last week. I have my composer.json requiring "craftcms/commerce": "dev-bugfix/race-conditions-duplicate-adjustments as 4.5.2".

@damonadigital no we didn't, Nathaniel's fix last week essentially solved the problem as soon as it was applied.

We have Cloudflare proxy enabled and a pages rules set like:

*.domain.co.uk/* Cache Level: Cache Everything
*.domain.co.uk/shop/* Cache Level: Bypass

For Blitz we have the same sort of thing:

'includedUriPatterns' => [
    [
      'siteId' => 1,
      'uriPattern' => '.*',
    ],
  ],
  'excludedUriPatterns' => [
    [
      'siteId' => 1,
      'uriPattern' => '^shop',
    ],
  ],

I haven't changed any Sprig components in the checkout.

pixelmachine commented 5 months ago

@nfourtythree will the changes you made work their way into the main Commerce branch or should I expect to stick with the bugfix one for now?

nfourtythree commented 5 months ago

Hi @pixelmachine

These changes were merged into Commerce and released as part of version 4.5.3. Apologies if we hadn't made that clear, but you are now safe to update to that version and things continue to work as they have been.

@damonadigital did any of the changes that @pixelmachine mentioned help with your project?

Thanks!

damonadigital commented 5 months ago

Hi @nfourtythree ,

We still need to try applying similar settings to Cloudflare as pixelmachine has done. Once we've tried this I will update you.

Thanks

damonadigital commented 5 months ago

Hi @nfourtythree ,

The settings @pixelmachine specified in Cloudflare have worked and we are no longer encountering this issue.

Thanks both for your help and assistance.

nfourtythree commented 5 months ago

Great news and thank you for updating us!