CandyShop / gerrit

Automatically exported from code.google.com/p/gerrit
Apache License 2.0
1 stars 0 forks source link

All hook processing stops - but Gerrit keeps working fine otherwise - implement better handling of dead hook subsystem #2020

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
We recently uncovered a problem whereby the backend hook process substructure 
of Gerrit appears to just stop/die/end whatever.

Apparently the Gerrit architecture (we are using 2.4.4 as of this writing) is 
designed in such a way as to permit the continuation of all Gerrit behavior 
even though the backend hook processing stops.

This seems to me like a risky implementation.

Backend hook processing - particularly in the patchset-created and 
comment-added and ref-updated hooks - is used by us (and I would guess by many) 
to implement interfaces to other systems that help control Gerrit workflow.  
The most notable example is integration with a change management system that 
keeps accounting straight between what is in Git and Gerrit and what is in the 
CM system.

When hook processing fails like this it has lead to data inconsistencies 
between Gerrit/Git and our other systems.  This condition effectively allows 
users to bypass company policy (which is implemented in hooks) and execute 
changes they should not be be allowed to execute.

The suggested feature is three fold:

1) If possible recover from a dead hook subsystem.
2) If not possible then provide a notification mechanism to notify users in the 
"Administrators" group whenever a critical error like this has occurred.
3) Provide an option that administrators can set that will block commits or 
reference updates PRE-OPERATION when the server is in a critical state and 
cannot process hooks.  (Pre-operation is necessary to ensure remotes stay in 
sync with the server's copy of changes.)   Alternatively, don't make this an 
option.  Make it default behavior.

With these changes in place administrators can be notified when a critical 
fault has occurred, the system will prevent itself from being corrupted with 
data out of sync with remotes, and the environment in which Gerrit is deployed 
and integrated via hooks can be kept clean and uncorrupted when faults occur of 
this nature.

Original issue reported on code.google.com by casta...@motorola.com on 18 Jul 2013 at 8:23

GoogleCodeExporter commented 9 years ago
Yes, We also meet this problem, bring us trouble to our jobs.
Gerrit: 2.8.4

Is there any solution? 

Original comment by nethors...@gmail.com on 12 Jul 2014 at 3:13