Open marcocastignoli opened 3 days ago
Related #1643 #1632
Understand when we can skip the compilation
Equal compilation_settings/sources
In order to understand when we can skip compilation ideally we could select from compiled_contracts
filtering by compiler_settings
, sources
and compiler_version
. The problem with this approach is that we would have to index jsonb
fields and it's not optimal for several reasons (from ChatGPT):
We can optimize this by first filter by fully_qualified_name+version
(after we put an index on them) and then trying to match the compilation_settings/sources (EDIT: or metadata) only with the few results filtered by fully_qualified_name+version
.
Equal runtime_code_hash
Another possible reason to skip compilation is checking if the compiled_contracts.runtime_code_hash
matches with the onchain bytecode from the contracts that is being verified. The match must be a metadata_match.
Can we maybe utilize the metadata somehow? Because the "metadata hash" is somehow the fingerprint of the compilation
Another possible reason to skip compilation is checking if the compiled_contracts.runtime_code_hash matches with the onchain bytecode from the contracts that is being verified. The match must be a metadata_match.
I'm not sure if this is straightforward. Don't we normalize the bytecodes? So their hashes wont be matching with the onchain bytecodes
So their hashes wont be matching with the onchain bytecodes
What about we compare the onchain bytecode with the stored onchain bytecodes (that are not normalized) passing through verified_contracts
? Following #1643
in other words:
select cc.*
from compiled_contracts cc
left join verified_contracts vc on vc.compilation_id = cc.id
left join contract_deployments cd on vc.deployment_id = cd.id
left join contracts c on cd.contract_id = c.id
where c.runtime_code_hash = 'current_verification_onchain_bytecode' and vc.runtime_metadata_match = true
Oh yes, so onchain bytecodes, we don't normalize, right?
This would work for contracts that have the exact same bytecode, if the immutables, libraries etc. change then it does not work. Still a good starting point.
How about the metadata?
I see as a longer research topic, us going into a "bytecode similarity" search direction. Blockscout is already doing something similar https://docs.blockscout.com/about/features/ethereum-bytecode-database-microservice#similar-contracts-search-enhancement
Another easy solution could be to include a metadata sha256 column in sourcify_matches
table. We index it, and we just use that to skip compilation, that's straightforward and easy to compute in hindsight.
E.g. services/server/src/server/services/VerificationService.ts
public async verifyDeployed(
checkedContract: CheckedContract,
sourcifyChain: SourcifyChain,
address: string,
creatorTxHash?: string,
): Promise<Match> {
// ...
// Use sha256(CheckcedContract.metadataRaw) to find already existing compilation output
// in the database that was created from the same metadata
const compilationOutput = await findCompilationOutputFromMetadataHash(checkedContract)
if (compilationOutput) {
// setting compilation output on the checkedContract will make the CheckedContract.recompile() return early
checkedContract.setCompilationOutput(compilationOutput);
}
/* eslint-disable no-useless-catch */
try {
const res = await libSourcifyVerifyDeployed(
checkedContract,
sourcifyChain,
address,
foundCreatorTxHash,
);
// ...
Is it somehow possible that a wrong metadata hash is appended at the onchain bytecode?
Is it somehow possible that a wrong metadata hash is appended at the onchain bytecode?
If I'm not wrong we always save the recompiled one, not the onchain one, so this should not be a problem
Is it somehow possible that a wrong metadata hash is appended at the onchain bytecode?
If I'm not wrong we always save the recompiled one, not the onchain one, so this should not be a problem
Yes, but I'm talking about the metadata hash of the contract for which you want to save the compilation.
Is it somehow possible that a wrong metadata hash is appended at the onchain bytecode?
If I'm not wrong we always save the recompiled one, not the onchain one, so this should not be a problem
Yes, but I'm talking about the metadata hash of the contract for which you want to save the compilation.
I would not use the onchain hash, I would use the hash of the uploaded metadata.json file.
The uploaded metadata.json hash contains all the information used for compilation, so if an already existing metadata with the same hash exists it means we can skip compilation
Okay that makes sense. The problem with this approach will then be that we are moving away from requiring a metadata.json with API v2. So we could only skip compilation if we have a metadata.json. Or do you think it could be generated from standard json input?
From https://github.com/ethereum/sourcify/issues/1632 we understood that most contracts share the same code, so it is possible to optimize Sourcify by skipping compilation fro already existing compiled contracts.