Open themak1985 opened 3 years ago
Yes! @themak1985 we are getting double receipts for Contribution pages yet not for Event Registration pages @eileenmcnaughton
Just noting that if you are using an older version of mysql then the locking mechanism might not be working - I think it works in 5.7.5+
We have been running 5.7 for a while now any other ideas?
5.7.31-percona-sure1-log to be exact alongside PHP 7.3.27
@eileenmcnaughton We're seeing two entries in civicrm_system_log for each transaction, whereas normally I would only expect to see one; these 2 are almost identical.
Record 1 is from an ip address registered to visa.com (i.e., it's from cybersource), and it contains all the fields you'd expect (auth_amount, decision, etc.) Record 2 is from the user's IP address, i.e., it's coming from the user's browser; it contains all of the same data as Record 1, plus some values defined in site-specific cookies (Drupal session id, etc.)
Is it normal that these records should both exist, or should we only be expecting one of them, per completed transaction? (If this is not normal, I'm inclined to think this may be related to the duplicate receipts problem.)
@twomice that DOES sound normal - generally the first one to come in should trigger a receipt. The second should detect it is completed & not do so. We did have a timing problem where they were both hitting at the same time. However, the first process grabs a mysql lock to prevent the second process doing the same thing at exactly the same time
Debugging in the live environment is showing these interesting details:
$this->getLock()
, which itself uses Civi::lockManager()->acquire()
to get a lock related to this contribution by transactionId.$this->getLock()
is called, it returns TRUE at line 882, return $lock->isAcquired();
I believe the expectation is that things would proceed like so:
contribution.completetransaction
api (in line 824).However, it's clear that the second invocation does not find a lock and so it just takes the same action as the first invocation.
It seems to me that either something is breaking the lock after the first invocation, or there's some bug in civicrm's lockManager (perhaps revealed by some edge case related to this environment).
I'll need to keep looking, but if someone has tips on where to look -- or even better, wisdom that says I'm really barking up the wrong tree -- I'd love to hear it.
@twomice fyi the underlying lock mechanism is implemented in mysql - prior to mysql 5.7.5 it was a bit dicey but it should be solid in modern mysql
@eileenmcnaughton This server is running "5.7.31-percona-sure1-log" -- any thoughts on whether that could be relevant?
Looks like "Percona Server for MySQL supports multiple user level locks per connection as of version 5.6.19-67.0. " (https://www.percona.com/doc/percona-server/5.6/scalability/multiple_user_level_locks.html), so perhaps we're covered there.
We're certain that the mysql version supports locks. My real question now is "why is CRM_Core_Payment_OmnipayMultiProcessor::getLock()" always returning TRUE, when it seems like it should be returning FALSE on one of the two invocations.
This leads to a question about how Civi::lockManager
is supposed to be used in the first place, since it seems isAcquired()
will always return TRUE. Not to clutter this ticket, and in the interest of more public learning, I've posted an SE question about that here: How is Civi::lockManager supposed to be used?
@eileenmcnaughton can you shed any light on that question? I think it would help us move this ticket forward, if you have the time.
I put some comments on SE - but basically you need to try using it from a different connection to see it work. I do wonder about that default timeout of 3 seconds being too little maybe though
Just to note that during the 6.5 release series of Stripe we had multiple "client specific" issues with IPN requests being processed at the same time - Stripe can send lot's of IPN requests at exactly the same time and unless you are sure you can lock one process you get strange errors. One of the big issues was identifying unique incoming requests - for example if a new payment came in and there were two IPNs triggered for this at the same time you can't lock both because you don't know if they're the same payment until you've started processing them. We ended up building a generic IPN queuing system for stripe 6.6 / mjwshared 1.0 which allows for filtering, prioritising, delaying and batching IPN processing. That solved all the concurrency issues we had. For more detail see https://docs.civicrm.org/mjwshared/en/latest/webhookqueue/
Thanks @mattwire . In this case we are logging the request (in order to debug this issue) and we can verify that the same request (actually, two requests with the same ipn payload) is being processed twice.
@eileenmcnaughton At this point I have a suspicion that the mysql locking is not performing as expected due to both requests using the same mysql connection. My next step will be to log and compare the mysql connection_id on each request. Client has this on hold at the moment in favor of other tasks, but I'll report findings here when we get back to this.
We've implemented a fix that seems to work for us (pending final confirmation from the client). I don't have a real PR ready, partly because I'm uncertain why this works. But I present my findings here:
The (short-term) fix:
diff --git a/CRM/Core/Payment/OmnipayMultiProcessor.php b/CRM/Core/Payment/OmnipayMultiProcessor.php
index a6a93712..3aac4282 100644
--- a/CRM/Core/Payment/OmnipayMultiProcessor.php
+++ b/CRM/Core/Payment/OmnipayMultiProcessor.php
@@ -816,9 +816,11 @@ class CRM_Core_Payment_OmnipayMultiProcessor extends CRM_Core_Payment_PaymentExt
if ($response->isSuccessful()) {
try {
//cope with CRM14950 not being implemented
- $this->loadContribution();
- if ($this->getLock() && $this->contribution['contribution_status_id:name'] !== 'Completed') {
+ $lock = Civi::lockManager()->acquire('data.contribute.contribution.' . $this->transaction_id, 5);
+
+ $this->loadContribution();
+ if ($this->contribution['contribution_status_id:name'] !== 'Completed') {
$this->gatewayConfirmContribution($response);
$trxnReference = $response->getTransactionReference();
civicrm_api3('contribution', 'completetransaction', [
The explanation: This fix does just a few things:
processPaymentNotification()
method directly, instead of using the getLock()
method (see more below about why we went this way.)3
is not enough for us.loadContribution()
will load (and cache) the contribution values, and I suspect that if this is done before the lock is acquired, then we may still be getting the older values (specifically 'contribution_status_id:name'). Of course, now that CRM-14950 is fixed, we should be able to expect contribution.completetransaction to skip changes to the contribution (which, I think, would be responsible for sending a receipt) if the contribution is already completed, so perhaps this does not matter.The mystery:
My testing shows a strange behavior in Civi::lockManager()->acquire()
, in that when wrapped in a function it does not properly detect existing locks. I frankly do not understand why that could be, but it's observable as follows:
Use this script for testing to observe Civi::lockManager()->acquire()
correctly acquiring (and waiting for) a lock:
$ cat /tmp/locktest_works.php
<?php
$startTime = time();
// Acquire lock directly.
$lock = Civi::lockManager()->acquire('data.contribute.contribution.1');
$lockAcquireTime = time();
echo "Time elapsed waiting for lock: ". ($lockAcquireTime - $startTime) . " seconds\n";
$timeout = 5;
echo "Sleeping $timeout...\n";
sleep($timeout);
echo "Script ended at: " . date('c') ."\n";
Run this first in a terminal (Terminal 1), and while it's still running (it sleeps for 5 seconds) also run it in a second terminal (Terminal 2). Observe the output:
Terminal 1:
$ cv scr /tmp/locktest_works.php
Time elapsed waiting for lock: 0 seconds
Sleeping 5...
Script ended at: 2021-11-15T17:08:47-06:00
Terminal 2:
$ cv scr /tmp/locktest_works.php
Time elapsed waiting for lock: 3 seconds
Sleeping 5...
Script ended at: 2021-11-15T17:08:50-06:00
These terminals demonstrate expected lock behavior: Terminal 1 reports "Time elapsed waiting for lock: 0 seconds", and then it holds that lock until the end of script execution. Terminal 2 reports "Time elapsed waiting for lock: 3 seconds", because it's waiting for Terminal 1's lock to be released.
Next, use a slightly modified script for testing to observe Civi::lockManager()->acquire()
failing to waiting for a lock when wrapped in a function:
$ cat /tmp/locktest_fail.php
<?php
function getLock() {
$lock = Civi::lockManager()->acquire('data.contribute.contribution.1');
}
$startTime = time();
// Use a function to acquire lock.
$lockIsAcquired = getLock();
$lockAcquireTime = time();
echo "Time elapsed waiting for lock: ". ($lockAcquireTime - $startTime) . " seconds\n";
$timeout = 5;
echo "Sleeping $timeout...\n";
sleep($timeout);
echo "Script ended at: " . date('c') ."\n";
Note this is the same as locktest_works.php, except that it wraps Civi::lockManager()->acquire()
in a function.
As before, run this first in a terminal (Terminal 1), and while it's still running (it sleeps for 5 seconds) also run it in a second terminal (Terminal 2). Observe the output:
Terminal 1:
$ cv scr /tmp/locktest_fail.php
Time elapsed waiting for lock: 0 seconds
Sleeping 5...
Script ended at: 2021-11-15T17:12:32-06:00
Terminal 2:
$ cv scr /tmp/locktest_fail.php
Time elapsed waiting for lock: 0 seconds
Sleeping 5...
Script ended at: 2021-11-15T17:12:33-06:00
These terminals demonstrate unexpected lock behavior: Terminal 1 reports "Time elapsed waiting for lock: 0 seconds", and then it holds that lock until the end of script execution. Terminal 2 also reports "Time elapsed waiting for lock: 0 seconds", because it's not waiting for Terminal 1's lock to be released.
This makes no sense to me, so I won't attempt to explain it, and I'll take my lumps when someone points out my ignorance of how locks are supposed to work.
@eileenmcnaughton Do you have any thoughts about what's going on here? Suggestions for working this into a PR that would help the project? Thanks.
@twomice Does this fix the locking issue? https://github.com/civicrm/civicrm-core/pull/22013
@mattwire We tried it, but that patch made no difference for us.
@twomice thank you! The client reports success on production with this fix, but like you I am not sure how the fix works and would appreciate reassurance this way is the right way. @eileenmcnaughton can you review and chime in please?
So "My testing shows a strange behavior in Civi::lockManager()->acquire(), in that when wrapped in a function it does not properly detect existing locks."
__destruct
on the function releases the lock (This is a pattern Tim quite likes so I've now learnt to find & handle it). Making the $lock a class property (which I've done) should fix the existing function
So it seems that in Cybersource, there is an authorization and if successful, a charge. CiviCRM is recording both of these as payments and I think this has led CiviCRM to send two receipts, one for the authorization and the other for the charge.
Is this happening to anyone else with Cybersource?