Derecho-Project / derecho

The main code repository for the Derecho project.
BSD 3-Clause "New" or "Revised" License
187 stars 47 forks source link

Bug fixes and new features for restart operations #156

Closed etremel closed 4 years ago

etremel commented 4 years ago

As I mentioned in our meetings, I've been working on some improvements to the code that runs during startup and restart in Group and ViewManager. Derecho can now be configured with a "restart leader" distinct from its normal leader, and if the enable_backup_restart_leaders option is True, it can also use a list of multiple restart leaders in priority order. If backup restart leaders are enabled, non-leader nodes will attempt to automatically resume the restart process at the next leader if the restart leader fails before the restart view is committed. This introduces some potential race conditions, however, so the feature is disabled by default, which means non-leaders will simply exit if they detect that the restart leader has failed.

In addition to these new features, I made several code improvements to the existing functionality of ViewManager, reducing its two constructors down to one and refactoring most of the constructor code into a separate "init" method that can be called by Group once it has finished constructing all the other "manager" objects.

Can someone verify that these changes don't adversely impact our performance numbers before I merge them into master?