codex-storage / codex-contracts-eth

Ethereum smart contracts for Codex
Other
6 stars 9 forks source link

Handling emergency scenarios of smart contracts #111

Open emizzle opened 4 months ago

emizzle commented 4 months ago

In case of an emergency, eg an exploit, admins should be able to freeze a contract, allowing only withdrawing of funds from the contract.

Freezing a contract will be the only functionality of an admin.

Once a contract is frozen, only withdrawal functionality would be enabled. This includes:

We should keep in mind that in normal operation, withdrawal txs (eg freeSlot, withdrawFunds) are only allowed when a request is cancelled or finished. Therefore, freezing a contract should disable these checks.

These additions should be done pre-audit.

AuHau commented 4 months ago

Also, the running Storage Requests will be "terminated" to the timestamp of the freeze, where the partial payout will be calculated for the storage providers for the service rendered till that day and the rest returned to the customer.

We should also think a bit more about "what happens next" after the freeze. We will probably deploy a fixed smart contract and a new version. Most probably the clients will need to recreate the Requests. Will the data persist long enough for this to work though?

emizzle commented 4 months ago

Also, the running Storage Requests will be "terminated" to the timestamp of the freeze, where the partial payout will be calculated for the storage providers for the service rendered till that day and the rest returned to the customer.

Are you suggesting that a call to freeze would perform the partial payouts instead of allowing clients to call withdrawFunds and SPs to call freeSlot? Maybe there would be legal implications as it allows an admin to move funds?

I'm assuming that we'll need to add another state, like RequestState.Frozen and SlotState.Frozen so that everything except withdrawFunds and freeSlot are disabled. freeSlot will probably need to check this state to enable partial payouts.

We may also need to add a contract event, eg RequestFrozen so that SPs providing proofs can handle the event and stop providing proofs to prevent losing gas fees on reverts.

Will the data persist long enough for this to work though?

Well i guess that it doesn't matter, if the storage request is being recreated, the SPs will need to fill the slot. If they already have the data, then there is no need to re-download it. However, this presents another issue: slot reservations expanding window would likely not allow the same SPs to fill the slot. However, I think that is a fair enough assumption. If we attempt to mitigate that, then it could create an exploitable path to allowing certain SPs to fill slots.

AuHau commented 4 months ago

Are you suggesting that a call to freeze would perform the partial payouts instead of allowing clients to call withdrawFunds and SPs to call freeSlot? Maybe there would be legal implications as it allows an admin to move funds?

No, but it is an option that honestly did not occure to me till now. It would solve the issue I was discussing with Mark about if somebody will take looong time to get around to withdraw back the funds they might in the meanwhile upgrade Codex to newer versions that do not support the frozen contract anymore and then looses the capability to withdraw the money. But I guess this specific edge case is not that likely and the worst case scenario is that they have to download older version of Codex that still can support the old frozen contract to perform the withdrawal.

I'm assuming that we'll need to add another state, like RequestState.Frozen and SlotState.Frozen so that everything except withdrawFunds and freeSlot are disabled. freeSlot will probably need to check this state to enable partial payouts.

I would honestly go simplier way of having frozenTimestamp which is zero when contract deployed and when frozne this is set and then all the calculations are done according it.

We may also need to add a contract event, eg RequestFrozen so that SPs providing proofs can handle the event and stop providing proofs to prevent losing gas fees on reverts.

For sure, but I would go just with simpler top-level ContractFrozen event.

Well i guess that it doesn't matter, if the storage request is being recreated, the SPs will need to fill the slot.

It matters because we must avoid the case that freezing contract would wipe the network... Remember that the idea is that after Client create Storage Request that gets started it can disappear from the network together with original dataset.

I was originally thinking that the node implementation could work like:

  1. Contract is frozen and ContractFrozen event is emitted
  2. Node detects this event, calls withdrawFunds and freeSlot
  3. Exit node to indicate that it can not continue in its operation

But now I am not so sure about the last step, because in this way the data would disappear from network until the node operators would update their nodes and that would make the transition without data loss harder.

emizzle commented 4 months ago

For sure, but I would go just with simpler top-level ContractFrozen event. Can you elaborate on what you mean by a top-level event? By top-level, do you mean "above" Request/Slot?

It seems like, as a worst case scenario, that "freezing" a contract is effectively synonymous with wiping the network, because once a contract is frozen, there will be no incentive for SPs to continue providing proofs and storing the data. Because of that, is it possible to avoid migrations?

Existing storage requests could be migrated from the old (frozen) contract to the new one upon deployment. Storage requests that are migrated could have their end times extended by that amount of time the contract was not available for use (eg the difference between some future block timestamp and the frozen timestamp). A new contract can be deployed and then activated at some specific block number (set in the constructor config). Once that new block number is hit, the contract is activated and SPs will be required to resume providing proofs for ongoing contracts. The gap between contract deployment and the agreed upon active block timestamp would have to be enough time to allow for the client to be upgraded with the new contract address, as well as allow for communication about the upgrade to all SPs, with enough time for them to perform the upgrade.

AuHau commented 4 months ago

Because of that, is it possible to avoid migrations?

You mean "to have migration" as between smart contracts right?

I also thought about it and discussed it with Mark. He unfortunately had a good point. The freezing contract will be done only because of security problems. We can't be really sure afterward that the frozen contract's state is valid because it could have been modified using some exploit so migrating the old "contract's state" could transfer "hacked state" of the contract to the new contract. That is why the only option should be for users to withdraw their funds. Btw. it is already attack surface as potentially there could be bug that would "assign to you more funds that you have put in"... But IMHO it is necessary functionality that we will have to "just risk it".

Also, you designed a nice migration system, but you forgot the most important and problematic part - moving funds. Without moving funds to the new contract this does not work. And again with migration you could move "hacked state" which then the attackers could use to get "hacked funds" from the new contract...

emizzle commented 4 months ago

Wiping the network seems VERY extreme, so maybe we could have an optional migration, such that once the exploit is identified, we can optionally migrate the data (and funds). If the exploit doesn't involve the state of the funds, then we should be able to migrate it to preserve the network. To me, it would be a massive reputation killer if Codex has an exploit then all the "sacred" data that has been stored on the network is no longer guaranteed to be there. As a user, I would not be comfortable storing my data on Codex after that.

AuHau commented 3 months ago

To me, it would be a massive reputation killer if Codex has an exploit then all the "sacred" data that has been stored on the network is no longer guaranteed to be there. As a user, I would not be comfortable storing my data on Codex after that.

Yup, I agree, and this is exactly what I pointed out earlier. At the same time, the ability to "move funds" that you are describing here is, from my POV on the security/legal scale the most impactful and something I thought we were trying to avoid. If we would allow it, we could already go with the way of having the contracts upgradable and hence possibly have a chance to fix the bugs, and then no migration would be necessary.

That said, I think we might be able to design a mechanism that could still allow the freezing approach with "out-of-contracts" migration. Lets discuss it tomorrow.