filecoin-station / spark-evaluate

Evaluate service
Other
1 stars 1 forks source link

The Grafana chart & alert for Balances (milliFIL) should take into account pending transactions #316

Open bajtos opened 3 months ago

bajtos commented 3 months ago

spark-evaluate is not able to commit new transactions:

2024-08-09T10:20:01Z app[e2867541be3e68] cdg [info]    message: 'failed to check balance: not enough funds including pending messages (required: 192.747690270205419759 FIL, balance: 192.688139350355372554 FIL): validation failure'

Yet, our dashboard monitoring the balances shows all is fine, and no alert was triggered.

Screenshot 2024-08-09 at 12 24 24
juliangruber commented 2 months ago

I don't understand this, which transaction would require 192 FIL to complete?

bajtos commented 2 months ago

message: 'failed to check balance: not enough funds including pending messages (required: 192.747690270205419759 FIL, balance: 192.688139350355372554 FIL)

I don't understand this, which transaction would require 192 FIL to complete?

My understanding of the problem:

Let's say there are 900 pending transactions and each transaction requires 0.214 FIL on average.

juliangruber commented 2 months ago

I understand the pending txs situation now 👍

juliangruber commented 2 months ago

Do you think this is urgent, especially given the recent cancellation of stuck txs? Otherwise I propose to do this a bit later

bajtos commented 2 months ago

The problem is that we don't have visibility into the actual amount of remaining funds in service wallets. The Grafana charts give us a false sense that all is good.

I'd say the priority depends on how easy/difficult this is to implement. If it's just adding another telegraf section to make an API call to get the amount reserved by the pending transactions, then I'd say let's do it soon.

I'll leave the decision for you to make, you know more about this part of our stack and how bad it is that we don't see amount reserved for pending transactions.

juliangruber commented 2 months ago

Asked about this in https://filecoinproject.slack.com/archives/CRK2LKYHW/p1725016742675359

bajtos commented 1 month ago

We encountered this problem again this morning.

Sep 20 08:20:19 9b22 vector: spark-evaluate 
CANNOT SUBMIT SCORES FOR ROUND 15534n (CALL 1/10): 
Error: could not coalesce error (error={ 
  "code": 1, 
  "message": "failed to check balance: not enough funds including pending messages (required: 85.823851039775149633 FIL, balance: 85.810686571479888285 FIL): validation failure" 
}