game-of-nfts / gon-evidence

This repo records GoN participants' info and evidence.
26 stars 313 forks source link

BUG: ICS721 Recursive Loop #479

Open mccallofthewild opened 1 year ago

mccallofthewild commented 1 year ago

Summary of Bug

The CosmWasm ICS721 implementation lacks sufficient constraints for safe relayer operations. To demonstrate this, I created the ICS721 virus.

Overview

The ICS721 virus is a malicious contract that infects either or both sides of a CosmWasm IBC connection. It creates an asynchronous recursive loop of relayers sending and timing out ICS721 packets, which execute a malicious transfer function, minting more NFTs and sending them as ICS721 packets, which they then need to transfer or timeout, and the loop repeats.

With only one packet required from the attacker, an IBC connection can be brought to its knees overnight. In the case of Game-of-NFTs, relayers minted over 500000 NFTs, ran out of gas funds, decommissioned their operations, and organizers had to set up new IBC channels for participants to complete their tasks.

Note: Prior to publicizing, it was important to first ensure this attack was not applicable to live ICS20 implementations including cw20-ics20 and TokenFactory. After testing this by writing the ics20-devils-snare contract, I am optimistic that submessage gas limits are a sufficient barrier. The version of Token Factory live on Juno should not be susceptible to this exploit, but the latest version includes a BlockBeforeSend hook which would make the transfer port quite vulnerable.

Steps to Reproduce

Sequence Diagram

sequenceDiagram
    participant I as Alice
    participant A as Chain A CW-721
    participant B as Chain B CW-721
    I-->>A: Upload Virus Contract
    I-->>B: Upload Virus Contract
    I->>A: Instantiate Virus A
    I->>B: Instantiate Virus B
    I->>A: Register Virus B Address
    I->>B: Register Virus A Address
    I->>A: IBC Transfer NFT-A to NFT-B
    I->>B: IBC Transfer NFT-B to NFT-A
    I->>A: Register Voucher A Address
    I->>B: Register Voucher B Address
    I->>B: IBC Transfer NFT-A back to NFT-A
    loop Exponential Transfer Virus:
        B->>A: Relayer Executes NFT-A.Transfer
        A-->>A: NFT-A.Transfer Mints new NFT-A
        A-->>A: NFT-A.Transfer IBC Transfers NFT-A1 ,NFT-A2 
        A-->>A: NFT-A.Transfer IBC returns NFT-B -> NFT-B
        A->>B: Relayer Executes NFT-B.Transfer
        B-->>B: NFT-B.Transfer Mints new NFT-B
        B-->>B: NFT-B.Transfer IBC Transfers NFT-B1, NFT-B2 
        B-->>B: NFT-B.Transfer IBC returns NFT-A -> NFT-A 
        loop Packet Timeout Subloop 
            Note over A, B: Packets Timeout in Relayer Queue 
            B->>B: Relayer Executes NFT-B.Transfer 
            B-->>B: NFT-B.Transfer Mints new NFT-B
            B-->>B: NFT-B.Transfer IBC Transfers NFT-B3
        end
        Note over A, B: Using batch transfers can ensure ordering.
        Note over A, B: Optimally vouchers return before native.
    end

Written Steps With Code

  1. Alice uploads the virus contract to both Chain A and Chain B.

    chaind tx wasm store \
    ./cw721_virus.wasm \
    --from $KEY --gas-prices 0.025ucosm \
    --gas "auto" \
    --gas-adjustment 1.5 \
    --yes
  2. Alice instantiates Virus A on Chain A and Virus B on Chain B.

    chaind tx wasm instantiate $VIRUS_CODE_ID '{ 
    "name":"Virus", 
    "symbol":"virus", 
    "minter":"'$ALICE_ADDRESS'" 
    }'
  3. Alice registers Virus B's address with Virus A and Virus A's address with Virus B.

    chaind tx wasm execute $SRC_VIRUS_CONTRACT '{
    "extension": {
      "msg": {
        "update_spike_proteins": {
          "accomplice_src": "'$COUNTERPARTY_VIRUS_CONTRACT'",
          "channel_id": "'$SOURCE_CHANNEL'",
          "bridge_contract": "'$SOURCE_BRIDGE_CONTRACT'"
        }
      }
    }
    }'
  4. Initializing Transfers: 4.1. Alice initiates an IBC transfer of NFT-A (of Virus A) from Chain A to Virus B Contract address on Chain B. 4.2. Alice initiates an IBC transfer of NFT-B (of Virus B) from Chain B to Virus A Contract address on Chain A.

    # this can be done by executing the counterfeit transfer message
    chaind tx wasm execute $SRC_VIRUS_CONTRACT '{
    "transfer_nft": {
    "recipient": "doesntmatterwontbevalidated",
    "token_id": "doesntmatterwontbevalidated"
    }
    }'
  5. Relayer executes #ibc_packet_receive entrypoints on each chain's cw-ics721-bridge, instantiating cw721 voucher contracts for each class id

  6. Alice registers voucher contract addresses with each virus contract

    chaind tx wasm execute $SRC_VIRUS_CONTRACT '{
    "extension": {
    "msg": {
      "update_spike_proteins": {
        "accomplice_dst": "'$LOCAL_COUNTERPARTY_VOUCHER'"
      }
    }
    }
    }'
  7. Alice initiates an IBC transfer, returning NFT-A from Virus B to itself on Chain A.

🔁 loop begins:

  1. Relayer executes Virus-A.Transfer on Chain A.
  2. Virus-A.Transfer mints new NFT-A on Chain A and IBC transfers NFT-A1 and NFT-A2 to Chain B.
  3. Virus-A.Transfer IBC transfers NFT-B back to Virus-B.
  4. A relayer executes Virus-B.Transfer on Chain A.
  5. Virus-B.Transfer mints new NFT-B on Chain B and IBC transfers NFT-B1 and NFT-B2.
  6. NFT-B.Transfer IBC returns NFT-A -> NFT-A.
  7. Repeat.

The above loop continues until the packets timeout in the relayer queue.

🔁 Packet timeouts initiate their own asynchronous loops as follows:

  1. Relayer attempts timeout, executing Virus-B.Transfer on Chain A.
  2. Virus-B.Transfer mints new NFT-B on Chain B and IBC transfers NFT-B1 and NFT-B2.
  3. NFT-B.Transfer IBC returns NFT-A -> NFT-A.
  4. Repeat

Impact

The demonstrated attack can instigate significant harm by causing relayers to self-inflict exponentially increasing packet volume and gas expenditures.

This is not necessarily a denial of service attack. While the deployed version mints 4-10 new NFTs per invocation, the most profitable variant need only transfer two packets at a time, silently and continuously collecting income from Juno's FeeShare module at the expense of relayers.

Conclusion

If ICS721 is to utilize the altruistic relayer economy of ICS20, packets must have semi-fungible assurances in terms of cost and functionality. This can be achieved via submessage gas limits and/or code id whitelists.

Optimally, users should be relaying their own transactions, which addresses most of the described risks.

Links

gjermundgaraba commented 1 year ago

This was entirely unnecessary to do on the active channels of the testnet, it could just as easily have been demonstrated in a way that was less intrusive - locally even. Proof could have been provided without making the life of everyone else suck. Even if you thought it was better to test this live, you could have had it running for a short while and then terminated it. You knew this worked at least 4 days ago and have not stopped it: https://discord.com/channels/669268347736686612/1074987031270268958/1084200758334996561

image

This is not, however, a new issue, nor is it specific to ICS721: https://github.com/0xekez/cw-ibc-example/tree/zeke/ibc-replay

All-in-all, I think you next time you want to show an exploit, do it in a nicer way. This was unnecessarily painful for all the other participants (and the organizers).

chillyvee commented 1 year ago

:)

mccallofthewild commented 1 year ago

Be aware that the prior https://github.com/game-of-nfts/gon-evidence/issues/246 claim was updated after my contract notably disrupted the Juno <> Stars connection.

The original bug report had been dealt with in Zeke's ics721-tester contract (infinite in-contract loops are futile as properly configured relayers impose gas limits and simulate transactions before sending them).

Relevant Links

mccallofthewild commented 1 year ago

@gjermundgaraba

This was unnecessarily painful for all the other participants (and the organizers).

I've been in touch with an organizer, and we agreed to sit back and watch the show. The testnet exists to stress-test the protocol, not to soothe bruised egos or supplement validator incomes. If we don't stretch the interchain standards to their limits now, we'll pay the price later. The Ark Protocol team prevailed and produced a self-relaying solution.

This was entirely unnecessary to do on the active channels of the testnet, it could just as easily have been demonstrated in a way that was less intrusive - locally even.

I accomplished this locally weeks prior, as showcased in the integration tests. Alas, a false negative in the ts-relayer test suite misconstrued it as inert until I deployed it on the live connection.

This is not, however, a new issue, nor is it specific to ICS721: https://github.com/0xekez/cw-ibc-example/tree/zeke/ibc-replay

If you'd perused the IBC-replay description and my issue, you'd see they're worlds apart. Zeke's protocol hinges on an ack to keep the loop going – a crucial difference, demanding full control over the IBC contract. Relayers set up their ports based on trust. An untrusted port can be dismissed at will. A highly trusted port acting in an untrusted way poses a significant threat with significant capital in the balance.

Zeke's competence is undeniable, and I have no doubt that he would have taken the required measures if he'd been aware of the issue beforehand. To insinuate that he knew about the exploit and deliberately left the protocol vulnerable is absurd.

Even if you thought it was better to test this live, you could have had it running for a short while and then terminated it. You knew this worked at least 4 days ago and have not stopped it: https://discord.com/channels/669268347736686612/1074987031270268958/1084200758334996561

In future IBC games, try seeking solutions instead of resigning to the circumstances at the first sign of trouble. Your swift reaction wasn't inconsequential, influencing both my decision and the organizer's to let events unfold.

towerkyoto commented 1 year ago

@mccallofthewild

I've been in touch with an organizer, and we agreed to sit back and watch the show.

We don't know whom you have been in touch with. At least from IRIS side, we have no idea about this.

gjermundgaraba commented 1 year ago

@mccallofthewild

I've been in touch with an organizer, and we agreed to sit back and watch the show.

If that is true, it was unnecessary and didn't actually stress-test anything beyond the initial phase when its effectiveness was proven. One of the effects that can be seen, on the other hand, is that testers started dropping off because the only way to proceed was to have the technical abilities to get to a self-relaying solution. I'm sure setting the bar that high for the GoN was not intended by the organizers.

The Ark Protocol team prevailed and produced a self-relaying solution.

Not sure they did, haven't seen it. Their CLI, while excellent, currently only provides the ability to flush channels.

you'd see they're worlds apart

Actually, they're not. They use different mechanisms and are, indeed, different. But they both exploit the exact same issue which is not new and is still very open: IBC relaying economy. It's the only real issue here and is not new.

That said, I do applaud you for finding this issue. If it can be mitigated or solved on the contract, or even protocol, level for this particular way of exploiting IBC relaying economy: Awesome! I will, however, stick to my initial assessment that the way this was done was unnecessary and could have been done in a way that didn't hurt participation.

In future IBC games, try seeking solutions instead of resigning to the circumstances at the first sign of trouble.

I'm afraid my only resignation was to stop spending time relaying a hopeless situation that was deliberately being sabotaged. I spent 3 days getting a self-relaying solution up and helped a bunch of people get their own self-relaying up and running: https://github.com/game-of-nfts/gon-evidence/issues/500

Your swift reaction wasn't inconsequential, influencing both my decision and the organizer's to let events unfold.

Hmm, I'm not sure mine was the ego that was bruised.

taitruong commented 1 year ago

@mccallofthewild

I've been in touch with an organizer, and we agreed to sit back and watch the show.

We don't know whom you have been in touch with. At least from IRIS side, we have no idea about this.

Me neiher. Also agree with @gjermundgaraba, it was and still is a pain for all participants. I was also thinking of testing this infinite loop - but by setting up my own IBC channel. This way it wouldn't harm anyone - except myself.

mccallofthewild commented 1 year ago

@taitruong @gjermundgaraba See, the cool thing is, I didn't harm anyone. Because this is a testnet.

0xekez commented 1 year ago

@mccallofthewild this is an awesome attack!! thank you for the detailed writeup and all the thought you put into it. I thought about very similar attacks while writing the contract.

do you have any suggestions about how to defend against this in the ICS-721 contract beyond the self-relaying approach i described here?

i thought long and deeply about this class of problem, and pretty much came to the conclusion that this is a bug in IBC and relayer software, and not something I can defend against at the contract level. Oak even flagged this as an issue in their audit report, but once again the conclusion was that this was a IBC relayer problem, and nothing could be done to defend on a permissionless protocol like IBC.

0xekez commented 1 year ago

also, @mccallofthewild, if you would like to contribute the test you wrote to the ICS-721 repo, i would love to merge it!

giansalex commented 1 year ago

do you have any suggestions about how to defend against this in the ICS-721 contract beyond the self-relaying approach i described https://github.com/DA0-DA0/dao-dao-ui/issues/885?

Adding a gas_limit to SubMsg like cw20-ics20 does. The malicious contract would not have enough gas to send other packet

0xekez commented 1 year ago

@giansalex, i've thought about gas limits a fair bit. it's not clear to me that this is a good solution, as:

  1. it's difficult to pick a gas limit that does not allow IBC packets to be sent, but also allows cw721 implementations to do custom logic on receiving tokens. what if a cw721 contract legitimately wanted to execute a message in response to a transfer being sent?
  2. gas limits feel like a relatively imprecise band-aid over the reality that non-self-relayed packets are NGMI. it feels strange to add code to the contract, to fix a bug in relayer economics.
0xekez commented 1 year ago

another detail here is that this bug is entirely removed if relayers just relay packets over connections they know to be using non-malicious cw721 implementations. i'm a little unclear on how deploying a new ICS-721 contract (which has a different port than other ones) would cause a different channel to be clogged. this once again seems like an issue with relayer software to me. 🤷

open to other opinions here, these are just mine.

mccallofthewild commented 1 year ago

@giansalex Adding a gas_limit to SubMsg

yup I suggested this in the issue and started adding it:

If ICS721 is to utilize the altruistic relayer economy of ICS20, packets must have semi-fungible assurances in terms of cost and functionality. This can be achieved via submessage gas limits and/or code id whitelists.

It's a tad bit different from cw20-ics20 because ics721 executes packets in batches.

But it still feels like there is room for attackers to work around this.

mccallofthewild commented 1 year ago

Something else I've considered is if contracts could grant foreign write access to certain namespaces in their k/v store. All cw721 conforming contracts share the same ownership map structure, so we could execute the write directly from the bridge contract and make the protocol immutable / gov-maintained.