Open manuelwedler opened 2 months ago
If we go with the second option we are going to have a dirty DB, that is contracts will have a creatorTxHash but no creation bytecodes. I think we should re-verify them.
We'll need a script that needs to connect to the both filesystems, the current one on GCP and the legacy one running on Argo. This script needs two lists:
creator-tx-hash.txt
in the legacy Argo repoThen it should take an intersection of these two lists (address+chainId as identifiers). This would give us the full list that needs modification and also the number of contracts affected. We can write this list of contracts in a new database instance, a local one etc. to mark off the ones that are verified.
Finally we send these contracts to re-verification on the migration instance (the instance container image needs to be updated) and have them re-verified. Bear in mind we need a small change in the code to trigger updating the row of that contract is it goes from a null
match to a partial.
Also before all of this review #1572
I think we can optimize this process by:
sourcify_sync_tx_hash
with additional transaction_hash
field) on prod database and pushing all the contracts that have a creator-tx-hash.txt
from the legacy Argo repocontract_deployments.transaction_hash
with sourcify_sync_tx_hash.transaction_hash
for each contract finding all the contracts that don't have transaction_hash in contract_deployments but they have it in sourcify_sync_tx_hash
creator-tx-hash.txt
from argo repo to gcp repocontract_deployments.transaction_hash
with an update query filtering using the query aboveYes we have to do 1. and 2. regardless. Let's do that. Just not sure if using the prod database is a good idea. Maybe create a user that only has read rights to the main tables but writes just to the new table.
Regardin the rest, I'm not sure if it's a good idea to accept the creatorTxHash even if we can't fetch the Transaction from this hash. That's basically what this PR is doing.
First thing is, we are not checking if this txHash is indeed the txHash that created this contract. Yes we get this from a somewhat reliable resource (Etherscan, Blockscout etc.) but I think we should (not must) check this. How we do this now is by fetching the Transaction object and checking if the 'new contract address algorithm', that is the deployer address + nonce yields the address of the contract being verified.
Second thing is, even if we decide it's ok to write the creatorTxHash without double checking, we still need the values deployer, block_number, and tx_index of the contract. These are available only if we fetch the Transaction via this hash. I don't think it would make sense if we only write the creatorTxHash but not these values.
What we can do maybe is we can fail the whole verification if we can't fetch the Transaction from the expected API like Etherscan, instead of saving this contract as just a "runtime match". The verification will fail and it needs to be submitted again. In that case (or actually regardless) we should monitor the errors in fetching Transaction's and get notified if e.g. we are hitting Etherscan API limits.
Regardin the rest, I'm not sure if it's a good idea to accept the creatorTxHash even if we can't fetch the Transaction from this hash. That's basically what this PR is doing.
I agree we should not accept creatorTxHash regardless. From my POV, we should implement a new service that goes through the database and tries to re-verify all the contracts that don't have creation information. But I'm wondering if it makes sense to first implement a more reliable way to fetch the creation bytecode, for example, by starting to index all creation bytecode.
We found that some contracts that had a
creator-tx-hash.txt
in the/files/any/
endpoint shown in our old kubernetes deployment didn't have it after the migration for our new setup. Thetransaction_hash
column incontract_deployments
is simply NULL for these cases.I investigated two examples:
0x00000000219ab540356cBB839Cbe05303d7705Fa on chain 1
The logs show that there was a creatorTxHash found but fetching the creation bytecode failed. The latter probably happened due to an unresponsive RPC.
We should still store the creatorTxHash in these cases. This is fixed by https://github.com/ethereum/sourcify/pull/1572
0x4E7095a3519A33dF3D25774c2F9D7a89eB99745D on chain 11155111
Here the log is a bit misleading. The fetching of the creatorTxHash failed actually because no
creatorTx
is in the jsonPayload. Therefore, lib-sourcify also didn't match with creation tx. I fixed the logging also in the above PR.So in this case the cause was probably the Etherscan API being unresponsive.
Database problem
What still remains is that we probably have a lot of contracts after the migration for which we are missing the creation tx hashes. I queried the DB and it shows that there are 3.7M contract_deployments without tx_hash and 1.4M with the tx_hash. This seems a lot without tx hash to me. Unfortunately, we cannot know how it was before because we have never recovered the old production DB fully.
Two options two fix this problem now:
creator-tx-hash.txt
files and add them manually to the database if not present yet.A side note: There are a probably more problems after the migration. We should check the database more thoroughly. An unresponsive RPC could also have caused a change in the match status after the migration.