Mysql reboots recovery - Githubissues

nikooo777 commented 5 years ago

While testing I accidentally restarted mysql and understandably so chainquery crashed.

The stacktrace is as follow:

ERRO[2018-12-21T00:23:17Z] MempoolSync:model: failed to execute a one query for block: bind failed to execute query: dial unix /var/run/mysqld/mysqld.sock: connect: no such file or directory 
[mysql] 2018/12/21 00:23:17 packets.go:36: read unix @->/var/run/mysqld/mysqld.sock: read: connection reset by peer
[mysql] 2018/12/21 00:23:17 packets.go:36: read unix @->/var/run/mysqld/mysqld.sock: read: connection reset by peer
[mysql] 2018/12/21 00:23:17 packets.go:36: read unix @->/var/run/mysqld/mysqld.sock: read: connection reset by peer
panic: model: failed to check if output exists: invalid connection

goroutine 19246177 [running]:
ERRO[2018-12-21T00:23:17Z] Datastore(GETADDRESS): model: failed to execute a one query for address: bind failed to execute query: invalid connection 
github.com/lbryio/chainquery/model.outputQuery.ExistsP(0xc003380c60, 0x2)
        /home/niko/go/src/github.com/lbryio/chainquery/model/output.go:203 +0x9e
github.com/lbryio/chainquery/datastore.PutOutput(0xc019fff560, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/niko/go/src/github.com/lbryio/chainquery/datastore/datastore.go:38 +0x2ac
github.com/lbryio/chainquery/daemon/processing.processVout(0xc01d91c060, 0xc0008776c0, 0xc01d70af60, 0x2a2d8, 0xc01f8d90b0, 0xffffffffffffffff)
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:213 +0x337
github.com/lbryio/chainquery/daemon/processing.voutProcessor(0x1, 0xc0044aa000, 0xc0044aa060)
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:64 +0x69
github.com/lbryio/chainquery/daemon/processing.initVoutWorkers.func1(0xc01f8d9560, 0xc0044aa000, 0xc0044aa060, 0x1)
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:57 +0x67
created by github.com/lbryio/chainquery/daemon/processing.initVoutWorkers
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:55 +0x83
panic: model: failed to check if address exists: invalid connection

goroutine 19246181 [running]:
github.com/lbryio/chainquery/model.addressQuery.ExistsP(0xc0044ad1e0, 0x1)
        /home/niko/go/src/github.com/lbryio/chainquery/model/address.go:172 +0x9e
github.com/lbryio/chainquery/datastore.GetAddress(0xc007321e90, 0x22, 0x0)
        /home/niko/go/src/github.com/lbryio/chainquery/datastore/datastore.go:109 +0x1cc
github.com/lbryio/chainquery/daemon/processing.processVout(0xc01d91c420, 0xc0008776c0, 0xc01d70af60, 0x2a2d8, 0x0, 0x0)
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:191 +0x741
github.com/lbryio/chainquery/daemon/processing.voutProcessor(0x5, 0xc0044aa000, 0xc0044aa060)
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:64 +0x69
github.com/lbryio/chainquery/daemon/processing.initVoutWorkers.func1(0xc01f8d9560, 0xc0044aa000, 0xc0044aa060, 0x5)
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:57 +0x67
created by github.com/lbryio/chainquery/daemon/processing.initVoutWorkers
        /home/niko/go/src/github.com/lbryio/chainquery/daemon/processing/outpoint.go:55 +0x83

Would it be possible to better handle this case? i.e. by re-attempting a connection every 10 seconds or so.

This would ensure that even in the case of a database restart, chainquery maintains continuity.

tiger5226 commented 5 years ago

This is a good issue. We need to solve it for internal-apis as well. Both use the same api server as well.

tiger5226 commented 5 years ago

This is a potential problem. If we lose a database connection our dataset could be in a corrupt state. We need something similar to a transaction for chainquery. Since we have multiple routines running and processing, Inputs, Outputs, and Transactions concurrently ( soon to be blocks too ), in a block, any loss of a db connection puts the application in a bad state. The atomic state currently is at the block level. We save things at different points in time across go routines. If any of the routines report an error, the block is rolled back. This significantly improves our error handling. However, in the event of a loss of db connection, the block cannot be rolled back currently. So we need a way to determine if a blocks successfully finished processing, besides its mere existence.

An improvement that would make this possible is to have a column on block called is_processed or something similar. When all go routines finish successfully the last db statement on a block is to flip this column to true. This way if we lose the db connection, on successful reconnect, the choice on which block to process next will be the highest height where is_processed is true + 1.

tiger5226 commented 5 years ago

This will be required for parallel block processing. I will add it to the 2.0 list.

OdyseeTeam / chainquery

Mysql reboots recovery #73