Open facuspagnuolo opened 5 years ago
we will need TO TRUST in something (...) I suggest to have an MVP version that trusts the root of authority (voting) as well, besides the Kernel's and the ACL
I completely agree. The chain of trust can not continue indefinitely. Building "a final decentralized decision-making process with a minimal version of the voting app" doesn't seem like an easy task and we will never have 100% guarantee that it's bug-free, so i would say let's go with this MVP and at least we will have the rest protected.
A couple of comments more:
I think that for an initial rollout of the Kill Switch we can assume that we can trust both the Kernel and the ACL (they wouldn't be protected by the Kill Switch).
For recoverability in case of a genuine bug, I like the idea of designating another address that would be able to perform an upgrade for a critical app if it has been frozen. Organizations would be able to decide whether they want to trust an address to perform this upgrade in case of emergency (which they could change in the future to a minimal voting app if it is implemented) or just whitelist the root authority application so it can never be frozen. I agree that implementing a minimal voting app would be quite an undertaking and not a high priority.
This emergency address that can be used to perform upgrades should also be able to change the default issues registry or the registry for a particular appId, as we should be able to recover in case that whoever controls the registry adds issues to freeze DAOs.
I am still a bit conflicted about how much complexity this emergency address would add and whether we should just trust the root authority for an initial version.
I am still a bit conflicted about how much complexity this emergency address would add and whether we should just trust the root authority for an initial version.
I like the idea of designating an emergency address as well, and started writing some code about it indeed. But I also think we can start testing the idea of a kill-switch without it, I mean, currently there is no way to freeze anything, neither the voting nor any other app. Therefore, between nothing and at least being able to freeze a subset of all the apps, I prefer the subset.
Summing up, I suggest moving forward with an MVP assuming that the Kernel, ACL, KillSwitch and Voting apps will be whitelisted.
- Switching off the
IssuesRegistry
With the current implementation this actually wouldn't be possible (as none of its methods are killSwitchProtected
), but I agree we should be set up to handle this case (or if the user sets a wrong contract as the issue registry).
In this case I think the natural thing to do would be to "propagate" the killswitch up the stack, such that any KillSwitch
s relying on the failing IssuesRegistry
would be turned on for all non-whitelisted apps.
- Switching off the Voting app
The problem with the "root authority" application is that an organization may be set up to only allow other apps to interact with it (by forwarding). In 0.7 organizations, this is now the default (you can only create a new vote by passing through the Token Manager, which checks that you hold organization tokens).
This presents a few problems:
EVMScriptRunner.runScript()
that the app has not been killswitched, andThe second part is quite a trap; if only the Voting app can change permissions, then in the event the Token Manager is killswitched, the organization becomes bricked (nobody can create votes to update apps, and nobody can create votes to update the permissions to let other addresses create votes).
Unfortunately I don't see a generic way to solve this, as the forwarding chain could be infinitely long. We could not implement 1, but this feels absolutely wrong; killswitched apps should not be allowed to run EVM scripts.
The only way out that I see is to have an "emergency address" scheme. On specific conditions that trigger the emergency mode (e.g. a designated app being killswitched, a network-wide switch being flipped, or directly invoked by an entity having the permission):
Kernel.setApp()
(or KillSwitch.setAllowedInstance()
), orcreateVote()
) and any synchronous forwarding capabilities (e.g. Token Manager)With the current implementation this actually wouldn't be possible (as none of its methods are killSwitchProtected), but I agree we should be set up to handle this case (or if the user sets a wrong contract as the issue registry).
Now that we moved it within the canPerform
function, it does :)
The only way out that I see is to have an "emergency address" scheme. On specific conditions that trigger the emergency mode (e.g. a designated app being killswitched, a network-wide switch being flipped, or directly invoked by an entity having the permission): [...]
I think we all agreed on that, we all like the idea of having an emergency address. However, we still need to discuss whether it must be decentralized or not, IMO it must. But as I said above, "if we still want to have a decentralized process to make that decision, it will be a smart contract which we will have to trust", which means it cannot be kill-switched. Therefore my proposal of moving forward with an MVP implementation assuming that the Kernel, ACL, KillSwitch and the "root authority" apps will be whitelisted.
Therefore my proposal of moving forward with an MVP implementation assuming that the Kernel, ACL, KillSwitch and the "root authority" apps will be whitelisted.
The problem with just whitelisting the "root authority" apps is that it's hard to detect on-chain what should be considered part of the "root authority" due to forwarders. In a default 0.7 org, the Token Manager would have to be considered part of the "root authority" since it is the only one that can create votes. If we only whitelist the Voting app, then we have a liveness problem: nobody can kick start the voting process, so nothing can change.
But, this liveness problem can be solved if we are able to permit the emergency address (which could be ANY_ENTITY
) a role (e.g. CREATE_VOTE_ROLE
) only in emergency situations. The emergency address would only be allowed to do some things unilaterally, like create a vote to upgrade an app.
A potential model:
ANY_ADDRESS
. We should discourage setting it to one of the org's apps, as it would create the same liveness problem.IssuesRegistry
that affected a forwarding app used as part of an org's "root authority" (we could do an off-chain check for this), emit emergency signalAn alternative, that would be less "global" (it may be a problem if anybody can create votes in any organization until that organization acknowledges the emergency) albeit at the cost of more user interaction, is to set up some rules per organization to detect the emergency mode when an app is killswitched (e.g. in 0.7 orgs, this would just include the Token Manager).
I see, and I think it's a particular case of what I described initially in the issue's description actually. We will still need to whitelist the Voting app, but also grant CREATE_VOTE_ROLE
permissions to the emergency address, while allowing it only to create votes to upgrade compromised apps.
Note that the part of granting CREATE_VOTE_ROLE
permissions is out of the scope of aragonOS, which makes me think if we should just provide an "emergency upgrade process" and encourage users to grant the create votes role to it...
When we talk about kill-switch recoverability we refer to allow DAOs to restore from real bad situations, and probably one of the best examples to think of is switching off the voting base contract, while having a voting app instance as the root authority of a DAO.
That said, we've been exploring the following edge cases to understand a robust and decentralized solution to this problem:
1. Switching off the
IssuesRegistry
This should be allowed and we actually can handle this scenario from the kill switch itself. We will need to decide what to do if a DAO is trusting an issues registry that has been kill-switched off.
2. Switching off the Voting app
Even though we could provide a backdoor method to allow creating a vote to perform an upgrade, if the voting contract is broken, we cannot guarantee that particular vote wont be flawless. If the voting app is broken, we will need to bypass it completely. Thus, we need to allow DAOs to perform an upgrade without having to go through the voting app. That said, if we still want to have a decentralized process to make that decision, it will be a smart contract which we will have to trust. Then, it should be the thinnest, simplest, and most minimal implementation we can provide (we can think it as an analogy of a proxy to the upgradeability solution).
3. Switching off the ACL
As we mentioned before for the voting app, if the whole ACL contract is broken, we cannot even trust the permissions query functions to check if a caller is allowed to perform an action like an upgrade. Then, we will need to either trust the ACL to use its authorization flow upon a given upgrade call, or to bypass it at all, building again a new custom and simple logic to do it.
4. Switching off the Kernel
If nothing could be trusted, there's nothing we can do. We need at least to trust the upgrade process, otherwise there’s no way out to solve a critical bug on a contract. But since the Voting app or the ACL could be broken, we should have a separate upgrade method that could be called only on emergency(*). Note that all the methods used for this call must be part of the subset of methods of the Kernel/ACL we will always trust (e.g. we will always need to trust
Kernel#getApp
function, since it is the underlaying function for every proxy call).5. Switching off the KillSwitch
The emergency upgrade alternative should check that the current version of the application is having an issue while the proposed new one is not. Then, we will need to trust the kill-switch implementation as well. Therefor, we will need to prevent users from kill-switching the kill-switch itself.
⚠️To discuss
Based on the different cases mentioned above we will need to discuss and decide which way we think is best to move forward. I suggest starting the discussion with the following ideas:
When saying "TO TRUST" an implementation, it means to consider that they won't have bugs at all. In other words, that we cannot kill-switch them.
KillSwitch
implementation, otherwise we won't be able to tell the status of an app.Kernel
, we will also need TO TRUST all the Kernel methods being involved in such process. For example,Kernel#getApp
.ACL
is involved in many functions of theKernel
, we could implement a minimal authorization check for the final decentralized decision-making process, or TO TRUST theACL
implementation as well.BTW, I've already been doing some proof of concepts about all these ideas.
💡Proposal
In order to start testing the whole kill switch idea, I suggest to have an MVP version that trusts the root of authority (voting) as well, besides the Kernel's and the ACL's methods that are required to perform upgrades. The idea is to delay the implementation of a new minimal decentralized process that acts as the final decision-making one, simply because it is a whole new problem in itself that will probably require a separate development lane.
(*) We refer to an emergency situation only when the base implementation of an app: