eileenmcnaughton / nz.co.fuzion.omnipaymultiprocessor

Omnipay Multi Processor Payment Processor For CiviCRM
Other
13 stars 44 forks source link

Anyone Else Getting Double Receipts & Payment Details from Cybersource Payments? #198

Open themak1985 opened 3 years ago

themak1985 commented 3 years ago

So it seems that in Cybersource, there is an authorization and if successful, a charge. CiviCRM is recording both of these as payments and I think this has led CiviCRM to send two receipts, one for the authorization and the other for the charge.

Is this happening to anyone else with Cybersource?

Stoob commented 2 years ago

Yes! @themak1985 we are getting double receipts for Contribution pages yet not for Event Registration pages @eileenmcnaughton

eileenmcnaughton commented 2 years ago

Just noting that if you are using an older version of mysql then the locking mechanism might not be working - I think it works in 5.7.5+

Stoob commented 2 years ago

We have been running 5.7 for a while now any other ideas?

5.7.31-percona-sure1-log to be exact alongside PHP 7.3.27

twomice commented 2 years ago

@eileenmcnaughton We're seeing two entries in civicrm_system_log for each transaction, whereas normally I would only expect to see one; these 2 are almost identical.

Record 1 is from an ip address registered to visa.com (i.e., it's from cybersource), and it contains all the fields you'd expect (auth_amount, decision, etc.) Record 2 is from the user's IP address, i.e., it's coming from the user's browser; it contains all of the same data as Record 1, plus some values defined in site-specific cookies (Drupal session id, etc.)

Is it normal that these records should both exist, or should we only be expecting one of them, per completed transaction? (If this is not normal, I'm inclined to think this may be related to the duplicate receipts problem.)

eileenmcnaughton commented 2 years ago

@twomice that DOES sound normal - generally the first one to come in should trigger a receipt. The second should detect it is completed & not do so. We did have a timing problem where they were both hitting at the same time. However, the first process grabs a mysql lock to prevent the second process doing the same thing at exactly the same time

twomice commented 2 years ago

Debugging in the live environment is showing these interesting details:

  1. As @eileenmcnaughton says is normal, CRM_Core_Payment_OmnipayMultiProcessor::processPaymentNotification() is called twice (once by redirect and once by webhook/ipn).
  2. On line 821, this method attempts to get a lock by calling $this->getLock(), which itself uses Civi::lockManager()->acquire() to get a lock related to this contribution by transactionId.
  3. Both times that $this->getLock() is called, it returns TRUE at line 882, return $lock->isAcquired();

I believe the expectation is that things would proceed like so:

  1. First invocation establishes the lock and updates the contribution with contribution.completetransaction api (in line 824).
  2. Second invocation finds that the lock is still established and so does not take such action.

However, it's clear that the second invocation does not find a lock and so it just takes the same action as the first invocation.

It seems to me that either something is breaking the lock after the first invocation, or there's some bug in civicrm's lockManager (perhaps revealed by some edge case related to this environment).

I'll need to keep looking, but if someone has tips on where to look -- or even better, wisdom that says I'm really barking up the wrong tree -- I'd love to hear it.

eileenmcnaughton commented 2 years ago

@twomice fyi the underlying lock mechanism is implemented in mysql - prior to mysql 5.7.5 it was a bit dicey but it should be solid in modern mysql

https://dev.mysql.com/doc/refman/5.7/en/locking-functions.html#:~:text=Multiple%20simultaneous%20locks%20can%20be,its%20locks%20for%20the%20name.

twomice commented 2 years ago

@eileenmcnaughton This server is running "5.7.31-percona-sure1-log" -- any thoughts on whether that could be relevant?

twomice commented 2 years ago

Looks like "Percona Server for MySQL supports multiple user level locks per connection as of version 5.6.19-67.0. " (https://www.percona.com/doc/percona-server/5.6/scalability/multiple_user_level_locks.html), so perhaps we're covered there.

twomice commented 2 years ago

We're certain that the mysql version supports locks. My real question now is "why is CRM_Core_Payment_OmnipayMultiProcessor::getLock()" always returning TRUE, when it seems like it should be returning FALSE on one of the two invocations.

This leads to a question about how Civi::lockManager is supposed to be used in the first place, since it seems isAcquired() will always return TRUE. Not to clutter this ticket, and in the interest of more public learning, I've posted an SE question about that here: How is Civi::lockManager supposed to be used?

@eileenmcnaughton can you shed any light on that question? I think it would help us move this ticket forward, if you have the time.

eileenmcnaughton commented 2 years ago

I put some comments on SE - but basically you need to try using it from a different connection to see it work. I do wonder about that default timeout of 3 seconds being too little maybe though

mattwire commented 2 years ago

Just to note that during the 6.5 release series of Stripe we had multiple "client specific" issues with IPN requests being processed at the same time - Stripe can send lot's of IPN requests at exactly the same time and unless you are sure you can lock one process you get strange errors. One of the big issues was identifying unique incoming requests - for example if a new payment came in and there were two IPNs triggered for this at the same time you can't lock both because you don't know if they're the same payment until you've started processing them. We ended up building a generic IPN queuing system for stripe 6.6 / mjwshared 1.0 which allows for filtering, prioritising, delaying and batching IPN processing. That solved all the concurrency issues we had. For more detail see https://docs.civicrm.org/mjwshared/en/latest/webhookqueue/

twomice commented 2 years ago

Thanks @mattwire . In this case we are logging the request (in order to debug this issue) and we can verify that the same request (actually, two requests with the same ipn payload) is being processed twice.

@eileenmcnaughton At this point I have a suspicion that the mysql locking is not performing as expected due to both requests using the same mysql connection. My next step will be to log and compare the mysql connection_id on each request. Client has this on hold at the moment in favor of other tasks, but I'll report findings here when we get back to this.

twomice commented 2 years ago

We've implemented a fix that seems to work for us (pending final confirmation from the client). I don't have a real PR ready, partly because I'm uncertain why this works. But I present my findings here:

The (short-term) fix:

diff --git a/CRM/Core/Payment/OmnipayMultiProcessor.php b/CRM/Core/Payment/OmnipayMultiProcessor.php
index a6a93712..3aac4282 100644
--- a/CRM/Core/Payment/OmnipayMultiProcessor.php
+++ b/CRM/Core/Payment/OmnipayMultiProcessor.php
@@ -816,9 +816,11 @@ class CRM_Core_Payment_OmnipayMultiProcessor extends CRM_Core_Payment_PaymentExt
     if ($response->isSuccessful()) {
       try {
         //cope with CRM14950 not being implemented
-        $this->loadContribution();

-        if ($this->getLock() && $this->contribution['contribution_status_id:name'] !== 'Completed') {
+        $lock = Civi::lockManager()->acquire('data.contribute.contribution.' . $this->transaction_id, 5);
+
+        $this->loadContribution();
+        if ($this->contribution['contribution_status_id:name'] !== 'Completed') {
           $this->gatewayConfirmContribution($response);
           $trxnReference = $response->getTransactionReference();
           civicrm_api3('contribution', 'completetransaction', [

The explanation: This fix does just a few things:

  1. Moves lock acquisition into the processPaymentNotification() method directly, instead of using the getLock() method (see more below about why we went this way.)
  2. Set a lock timeout of 5 seconds, rather than the default of 3 -- again, not sure why this is required, but even explicitly setting it to 3 is not enough for us.
  3. Loads the contribution after acquiring the lock, rather than before. This seems relevant because loadContribution() will load (and cache) the contribution values, and I suspect that if this is done before the lock is acquired, then we may still be getting the older values (specifically 'contribution_status_id:name'). Of course, now that CRM-14950 is fixed, we should be able to expect contribution.completetransaction to skip changes to the contribution (which, I think, would be responsible for sending a receipt) if the contribution is already completed, so perhaps this does not matter.

The mystery: My testing shows a strange behavior in Civi::lockManager()->acquire(), in that when wrapped in a function it does not properly detect existing locks. I frankly do not understand why that could be, but it's observable as follows:

Use this script for testing to observe Civi::lockManager()->acquire() correctly acquiring (and waiting for) a lock:

$ cat /tmp/locktest_works.php 
<?php

$startTime = time();

// Acquire lock directly.
$lock = Civi::lockManager()->acquire('data.contribute.contribution.1');

$lockAcquireTime = time();
echo "Time elapsed waiting for lock:  ". ($lockAcquireTime - $startTime) . " seconds\n";

$timeout = 5;
echo "Sleeping $timeout...\n";
sleep($timeout);
echo "Script ended at:                " . date('c') ."\n";

Run this first in a terminal (Terminal 1), and while it's still running (it sleeps for 5 seconds) also run it in a second terminal (Terminal 2). Observe the output:

Terminal 1:

$ cv scr /tmp/locktest_works.php 
Time elapsed waiting for lock:  0 seconds
Sleeping 5...
Script ended at:                2021-11-15T17:08:47-06:00

Terminal 2:

$ cv scr /tmp/locktest_works.php 
Time elapsed waiting for lock:  3 seconds
Sleeping 5...
Script ended at:                2021-11-15T17:08:50-06:00

These terminals demonstrate expected lock behavior: Terminal 1 reports "Time elapsed waiting for lock: 0 seconds", and then it holds that lock until the end of script execution. Terminal 2 reports "Time elapsed waiting for lock: 3 seconds", because it's waiting for Terminal 1's lock to be released.

Next, use a slightly modified script for testing to observe Civi::lockManager()->acquire() failing to waiting for a lock when wrapped in a function:

$ cat /tmp/locktest_fail.php 
<?php

function getLock() {
  $lock = Civi::lockManager()->acquire('data.contribute.contribution.1');
} 

$startTime = time();

// Use a function to acquire lock.
$lockIsAcquired = getLock();

$lockAcquireTime = time();
echo "Time elapsed waiting for lock:  ". ($lockAcquireTime - $startTime) . " seconds\n";

$timeout = 5;
echo "Sleeping $timeout...\n";
sleep($timeout);
echo "Script ended at:                " . date('c') ."\n";

Note this is the same as locktest_works.php, except that it wraps Civi::lockManager()->acquire() in a function.

As before, run this first in a terminal (Terminal 1), and while it's still running (it sleeps for 5 seconds) also run it in a second terminal (Terminal 2). Observe the output:

Terminal 1:

$ cv scr /tmp/locktest_fail.php                                                                                                                                                    
Time elapsed waiting for lock:  0 seconds
Sleeping 5...
Script ended at:                2021-11-15T17:12:32-06:00

Terminal 2:

$ cv scr /tmp/locktest_fail.php 
Time elapsed waiting for lock:  0 seconds
Sleeping 5...
Script ended at:                2021-11-15T17:12:33-06:00

These terminals demonstrate unexpected lock behavior: Terminal 1 reports "Time elapsed waiting for lock: 0 seconds", and then it holds that lock until the end of script execution. Terminal 2 also reports "Time elapsed waiting for lock: 0 seconds", because it's not waiting for Terminal 1's lock to be released.

This makes no sense to me, so I won't attempt to explain it, and I'll take my lumps when someone points out my ignorance of how locks are supposed to work.

@eileenmcnaughton Do you have any thoughts about what's going on here? Suggestions for working this into a PR that would help the project? Thanks.

mattwire commented 2 years ago

@twomice Does this fix the locking issue? https://github.com/civicrm/civicrm-core/pull/22013

twomice commented 2 years ago

@mattwire We tried it, but that patch made no difference for us.

Stoob commented 2 years ago

@twomice thank you! The client reports success on production with this fix, but like you I am not sure how the fix works and would appreciate reassurance this way is the right way. @eileenmcnaughton can you review and chime in please?

eileenmcnaughton commented 2 years ago

So "My testing shows a strange behavior in Civi::lockManager()->acquire(), in that when wrapped in a function it does not properly detect existing locks."