High Availability and Data Reload on Program Restart

gitbitex / gitbitex-new

an open source cryptocurrency exchange

Apache License 2.0

240 stars 91 forks source link

High Availability and Data Reload on Program Restart #38

Open hayletdomybest opened 1 month ago

hayletdomybest commented 1 month ago

Hi,

I have two questions regarding the system:

Data Reload on Program Restart: When the program is closed and then reopened, where does it reload the orderbook, account, and other relevant data from the last session? Is there a specific mechanism in place to persist and restore this data on startup?
High Availability Support: Does the system support high availability? If so, could you provide details on how this is implemented and what strategies are recommended to ensure redundancy and minimal downtime?

Thanks in advance for your help!

greensheng commented 1 month ago

1，EngineSnapshotThread will periodically create snapshots for the matching engine and save them in MongoDB. When the matching engine starts, it will read the snapshot from MongoDB and restore it.

2，The matching engine supports deploying multiple instances simultaneously, but only one instance will be active while the others will wait. Once the active instance exits, another instance will immediately start working.

hayletdomybest commented 1 month ago

I have reviewed the materials and have the following questions:

I noticed that each time the matching consumer is executed, it starts by sending a CommandStartMessage with an incremented Sequence number. During the process, entities that appear to be updated also use this Sequence number as a base for incremental updates. Finally, a CommandEndMessage is sent. What is the main purpose of this Sequence, and why is it implemented this way?
When multiple matching engines are started, only one leader node consumes messages due to Kafka's characteristics. Each node initializes by dumping data from MongoDB into memory. I'm wondering: since other slave nodes don't participate in the consumption process, will their data become out of sync? If the leader node fails, will the slave nodes have missing data? Or is there a mechanism in place to ensure synchronization across all nodes?

greensheng commented 1 month ago

CommandStartMessage and CommandEndMessage respectively represent the start and end of a transaction, ensuring data consistency when processing messages downstream.
As long as the slave node obtains a complete snapshot, it can start working normally. Any snapshot at any position supports replay, but there may be some duplicate data, which will be deduplicated downstream based on the sequence of the message.