Open dnwiebe opened 1 year ago
While now, when the checks to be done by the PaymentAdjuster are not yet in place, it is not a necessarily rare event because error stamp can occur with any time when the user’s wallet dries out from funds, I believe this is the kind of error that we can expect most of times, perhaps in the vast majority of situations and therefore this is being a genuine concern now. However, the PaymentAdjuster will work towards mitigating this stale-mate threat as it will never permit to go so far that payments with unrealistic parameters would be requested Fromm the blockchain service and therefore also no error from missing means for transaction or service fees could come to exist.
I think it’s much possible that the users now get some accounts blocked from their by-us-owed balances being paid. Which must inevitably leads to persistent bans from these touched Nodes. I think it’s a serious problem affecting us a lot. @kauri-hero
Solving this by either of the two ways (while both should be implemented eventually) is going to shift us to meet only so-considered corner case errors happening remarkably infrequently. This consists of, first, preventing these errors by the PaymentAdjuster, second, allowing the database stamps disappear eventually under well defined conditions and reranking these accounts back among those to be inspected with the upcoming scan for payables.
There are several problems mentioned below. Do enough research to be able to write cards to deal with each of them. If you don't have time to do enough research for a particular card, make a more-specific Spike card for that problem.
Problems:
Currently, if there is a problem confirming a pending payment on the blockchain, we update that pending payment's record in the PENDING_PAYMENT table by modifying the PROCESS_ERROR column to contain the string 'ERROR'. The detail of the error will be in the logs written at the time. However, finding the relevant logs could be annoying, because the only way to locate them would be to search through all the log files for references to the transaction hash in question. It would be better for the PROCESS_ERROR column to be named something like ERROR_AT_TIMESTAMP, and have it be either NULL (if no error) or the time the error occurred, if there was an error. That would narrow it down to a single logfile (or perhaps two) that needed to be examined.
If we set the gas price too low, we might get a languishing transaction that may be paid someday, but not today. There should be a mechanism for resubmitting transactions, which means we'll have to remember nonces or retrieve them from the blockchain. Part of the Solution - verify if there is a tool to verify the nonce from the txn hash. Great consideration to have stored the nonce when the payments are formed initially to send payable txn
If a payment fails, whether we have 'ERROR' or a timestamp in the PROCESS_ERROR column, we're dumping the problem in the lap of the user and expecting him to use SQL and direct blockchain access to fix it. Isn't it the case that the vast majority of real-world errors are going to be simple ones that we can automatically handle? Which ones are those? Part of solution - spend time with testing group to summarize the most common real-world errors that occur. Non-sufficient funds, low gas
If a pending payment is marked in the PROCESS_ERROR column, it is never retried. This is because as long as the pending-payment record exists, the payable table refers to it, and that reference prevents the scan from including it. This is probably a good idea for some errors, but a bad idea for others. This implies that we need to keep more information about errors than just the 'ERROR' stamp or a timestamp, so that we can retry some payments and abandon others to the attentions of the user. This could be included in other bullet points, but see how that goes in design
When a payable is first submitted to the blockchain, before it has been confirmed, it's attached to a record in the pending-payable table. When the payment is confirmed, the pending-payable record is removed and the payable record is adjusted with the last-paid timestamp. When a payable scan is done, payable records that are attached to pending-payable records (that is, payments in progress) are ignored. If a pending payment encounters an error and is marked in its PROCESS_ERROR column, it will probably never complete, which means its pending-payment record will remain attached to its payable record, which means the payable scan will not submit any further payment for that Node until the pending-payable attachment is cleared. I see four cases here. In order to distinguish between some of these cases, we'd need to see an error message, which seems to be difficult to come by. It should be possible somehow, though.
pending-payables scanning should be able to address the above with the right design
This card used to be solely about the PROCESS_ERROR column of PENDING_PAYABLE; if one of the cards that springs from this Spike is about fixing that, you might want to use this text in it: