EspressoSystems / cape

Configurable Asset Privacy for Ethereum
https://cape.docs.espressosys.com/
GNU General Public License v3.0
97 stars 16 forks source link

Investigate "Wrong root error" that happened on 17/06/222 #1152

Closed philippecamacho closed 2 years ago

philippecamacho commented 2 years ago

Related conversations:

[6/17 7:20 PM] Jeb Bearer First thing to figure out is whether the transaction being posted by the wallet to the relayer is already invalid, or whether it is getting corrupted in the relayer. If the former...oof. If the latter, it could either be corruption on deserialization from the network, or on converting CapeModelTxn to a sol transaction

[6/17 7:21 PM] Jeb Bearer Haha exactly

[6/17 7:24 PM] Jeb Bearer Here's where the Merkle root gets converted to Solidity:```pub struct MerkleRootSol(pub U256); jf_conversion_for_u256_new_type!(MerkleRootSol, NodeValue);macro_rules! jf_conversion_for_u256_new_type { ($new_type:ident, $jf_type:ident) => { impl From<$jf_type> for $new_type { fn from(v: $jf_type) -> Self { let mut bytes = vec![]; v.serialize(&mut bytes).unwrap(); Self(U256::from_little_endian(&bytes)) } } impl From for $new_type { fn from(v: U256) -> Self { Self(v) } } impl From<$new_type> for $jf_type { fn from(v_sol: $new_type) -> Self { let mut bytes = vec![0u8; 32]; v_sol.0.to_little_endian(&mut bytes); let v: $jf_type = CanonicalDeserialize::deserialize(&bytes[..]) .expect("Failed to deserialize U256."); v } } }; } surprised 1

[6/17 7:25 PM] Jeb Bearer thanks teams

[6/17 7:25 PM] Jeb Bearer Is it correct to be using CanonicalSerialize here rather than AbiEncode?

[6/17 7:25 PM] Nathan Yospe The possibilities are: 1) txn is corrupt in the wallet 2) wallet sometimes points to the wrong address when serializing 3) relayer sometimes grabs the wrong piece of the submitted stream to relay as the txn 4) Rust and solidity serialization are not perfectly symmetrical like 1

[6/17 7:25 PM] Jeb Bearer BTW the wallet->relayer request is using JSON, not bincode

[6/17 7:26 PM] Jeb Bearer But it shouldn't be related to big numbers or anything, we've tested this extensively and every Merkle root is a big number, since they're random

[6/17 7:26 PM] Nathan Yospe The txn root is serialized as tagged base64?

[6/17 7:27 PM] Nathan Yospe Into Json, I mean.

[6/18 3:44 PM] Mathis Antony Suggestion for a #faucet channel: https://discord.com/channels/854451048012709889/854451048508686346/987803342291992616 . I think it's not a bad idea if we want the general and support chat to be a bit more on topic. Not not sure we have the bandwidth to police it if necessary though. like 2 Discord - A New Way to Chat with Friends & Communities Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.

[6/18 4:19 PM] Mathis Antony I think goerli is down. https://stats.goerli.net/ surprised 1 {{ $root.networkName }} Goerli-Testnet Status

[6/18 4:19 PM] Mathis Antony https://goerli.etherscan.io/ TESTNET Goerli (GTH) Blockchain Explorer Etherscan allows you to explore and search the Goerli blockchain for transactions, addresses, tokens, prices and other activities taking place on Goerli (GTH)

[6/18 4:22 PM] Charles Lu Mathis AntonyI think goerli is down. https://stats.goerli.net/stats.goerli.net is still saying new blocks are being made

[6/18 4:22 PM] Charles Lu is it just Etherscan?

[6/18 4:22 PM] Charles Lu but best block number is the same?

[6/18 4:25 PM] Mathis Antony I noticed all my txns are "pending". Our systems seem to be okay (except faucet issues). I can't find the txns on etherscan or other block explorers though. like 1

[6/18 4:26 PM] Mathis Antony The new "block" on stats.goerli.net always has the same block number.

[6/18 4:31 PM] Nathan Yospe 😱

[6/18 4:54 PM] Mathis Antony I'm syncing the goerli chain locally to maybe get a bit of a better idea what's going on. Not quite sure how long this is going to take. Pinned a message on Discord about the Goerli outage. like 2 heart 1

[6/18 4:58 PM] Jill Gunter Thanks Mathis

[6/18 5:38 PM] Mathis Antony New blocks on goerli!

[6/18 5:52 PM] Mathis Antony Did transfer, wrap unwrap and looks like we're back in business. All pending txns from during the outtage also seem to eventually have gone through. heart 3

[6/18 6:05 PM] John Corbett New marketing tactic: We stand up a reliable Goerli alternative.

[6/18 8:08 PM] Jill Gunter Shoutout to Mathis Antony for being totally on top of this today 🔥🔥🔥 heart 2

[6/18 8:10 PM] Ben Fisch Way to go Mathis Antony!!

[6/18 8:30 PM] Philippe Camacho Faucet is not working currently.https://app.datadoghq.com/logs?cols=&from_ts=1655594995061&index=&live=true&query=host%3A%2Agoerli%2A+source%3Acloudwatch&stream_sort=time%2Cdesc&to_ts=1655598595061 Datadog: Log In

[6/18 8:32 PM] Mat Richmond https://app.datadoghq.com/logs?cols=status%2C%40logger.name%2C%40error.message&event&from_ts=1655597820365&index=%2A&integration_id=&integration_short_name=&live=true&messageDisplay=inline&query=source%3Acloudwatch&saved_view=870633&stream_sort=desc&to_ts=1655598720365&viz=stream Datadog: Log In

[Sunday 12:44 AM] Jeb Bearer Mathis AntonyWhere do you see the nonsense root? The error message doesn't contain it. The hex string in the error message is just hex of "root not found".Oh. Thanks for pointing this out Mathis Antony. We were barking up the complete wrong tree again. Given that there is no known nonsense root, and that the wallets seem to have eventually been able to start up again each time (which suggests they internally had a correct past root, or else they never would have caught up to the current root) then it is again plausible that the wallets were just > 40 transactions behind

[Sunday 12:45 AM] Jeb Bearer I think a pretty good hypothesis is that when 1000+ transactions all got added to Goerli at once after the disruption, the contract got advanced by a bunch of blocks very quickly, and it took a long time for that information to propagate to the EQS and then the wallets. There may have even been some latency between the network starting to add blocks again, and the query services starting to spit out events again like 2

[Sunday 12:46 AM] Jeb Bearer If that is the case, it indicates that a good number of the transactions in the first block after the disruption were CAPE transactions, which makes me wonder again if a spike in volume from the CAPE launch contributed to Goerli going down heart 3

[Sunday 12:46 AM] Jeb Bearer Re: the faucet I think we just need to replace it with a more asynchronous version. The current design is pretty much impossible to scale like 4

[Sunday 12:47 AM] Jeb Bearer I think the minimalest version of the async faucet design we've talked about is the work of just a few days

[Sunday 1:11 AM] Jeb Bearer It's worth pointing out that switching the number of past merkle roots from 1000 to 40 may well have been the difference between our wallets withstanding this disruption vs surfacing errors. We chose the smaller value so that a wallet can consider a "stuck" transaction failed after only 40 blocks, without having to add complicated logic for monitoring the mempool for the pending transaction.This is something to think about for Espresso. We may want to add APIs in Phaselock that allow the wallets to query which nodes have a given transaction hash in their mempool, so that the wallet can directly check and consider a transaction failed if it has been thrown out of all the mempools. The simpler idea would be to add a valid_until field (separate from the one for credentials) that exists solely to allow the wallet to consider a pending transaction failed after some small number of blocks, without coupling that number to the number of past Merkle roots like 4

[Sunday 6:56 PM] Jill Gunter Jeb Bearer I guess it's been 40 blocks, because I just got some USDC and a bunch of CAPE back from a transaction that had been tossed out of the mempool! Kinda cool to see that working, even if it took a very… I wonder if the same thing has been happening to me because my balances have been all over the place

[Sunday 7:00 PM] Jeb Bearer Possibly. Have they stabilized on the correct balance? Do you have any pending transactions?

philippecamacho commented 2 years ago

@sveitser manage to reproduce the problem today using the curl script for sending assets. @jbearer Very likely if the wallet is more than 40 CAPE blocks behind the error is generated, which is expected.