Arithmetization <=> Prover, Data Transfer for Hash

Soleimani193 commented 8 months ago

Description: Hash computation on Linea is done as the following;

the arithmetization provides the relevant columns to the prover
the arithmetization provides a flag to locate the relevant data in the columns
the prover extracts the data from the arithmetization columns given the flag
the prover builds the inputs to the hash and feeds the blocks to the keccak module

There are three sources of data on the arithmetization side:

TX_RLP
RLP-ADD
SHAKIRA

Note. ROM isn't on the list as any bytecode that gets deployed (even temporarily) finds itself duplicated in the SHAKIRA module.

All these modules provide relevant columns as:

HashNum
Index
Limb
NByte

See the details here.

The flag provided for each module is different and does not have the same structure:

for TX_RLP, flag := LC *LX
- for RLP_ADD, flag := LX
- for SHAKIRA, flag = IS_KECCAK_DATA

Particularly the Flag for TX-RLP is not directly given to the prover and the prover has to build the flag itself.

The issue with this is that the prover has to handle the arithmetization modules differently and also should know the structure of LX , LC and apply the constraints asserting to the relation flag := LC *LX

Solution: the flag should be given directly by the arithmetization team as a single binary column.

letypequividelespoubelles commented 8 months ago

@Soleimani193 We don't provide you with a HASH_NUMBER column at least for RLP_ADDR and RLP_TXN.

No issue for me to merge the flag column, so you'll have, for 3 modules:

TO_HASH_BY_PROVER (binary flag)
INDEX (starting at 0)
LIMB (16 left aligned bytes)
nBYTES (between 1 and 16, nb of bytes to extract)

If ok for you @OlivierBBB I do the modif for rlp* module, you do for SHAKIRA ?

OlivierBBB commented 8 months ago

Hi @Soleimani193. We just discussed this with @letypequividelespoubelles. The conclusion we reached is as follows:

we will include a IS_KECCAK_DATA column in the RLP_TXN and RLP_ADDR modules
there is the potential for conflicts in the HASH_NUM between these data sources; as such we must introduce moderately large integers e.g.
- RLP_TXN_shift = 0
- RLP_ADDR_shift = 256^2
- SHAKIRA_shift = 2 * 256^2
so that what the Prover hash considers the ID is HASH_NUM + (appropriate shift)

Note. Adding the appropriate shift should happen in the "lookup" from the Arithmetization module to the "KECCAK prover module".

Also one thing that is currently missing from the picture are the outputs. The arithmetization requires

the truncated hash for RLP_ADDR (last 20 bytes)
the full hash for SHAKIRA

letypequividelespoubelles commented 8 months ago

Also @Soleimani193 the column for the unique ID of the hash has different name for the different module. If it's easier for you, we can make it the same name for every module (HASH_NUMBER) and avoid you to add a shift in the lookups.

Soleimani193 commented 7 months ago

Also @Soleimani193 the column for the unique ID of the hash has different name for the different module. If it's easier for you, we can make it the same name for every module (HASH_NUMBER) and avoid you adding a shift in the lookups.

@letypequividelespoubelles Yes it is more convenient without a shift. Also, what pattern may I see over the HASH_NUMBER column? can I say the Hash_Num extracted by flag or remains fixed or increases? For example, after applying the flag TO_HASH_BY_PROVER over HASH_NUM can the result be something like (1 , 1, 1, 3 , 7, 7, 7, 7, 7, 20 , 20 )

Soleimani193 commented 7 months ago

Hi @Soleimani193. We just discussed this with @letypequividelespoubelles. The conclusion we reached is as follows:

we will include a IS_KECCAK_DATA column in the RLP_TXN and RLP_ADDR modules

there is the potential for conflicts in the HASH_NUM between these data sources; as such we must introduce moderately large integers e.g.

RLP_TXN_shift = 0

RLP_ADDR_shift = 256^2

SHAKIRA_shift = 2 * 256^2

so that what the Prover hash considers the ID is HASH_NUM + (appropriate shift)

Note. Adding the appropriate shift should happen in the "lookup" from the Arithmetization module to the "KECCAK prover module".

Also one thing that is currently missing from the picture are the outputs. The arithmetization requires

the truncated hash for RLP_ADDR (last 20 bytes)

the full hash for SHAKIRA

@OlivierBBB what we output is the pure output of hash. Thus proving that the pure output of the hash is correctly truncated into 20 last byte is on your side.

This means you should have a column for the output of the hash and this is the column you should give us.

letypequividelespoubelles commented 7 months ago

Here are the following column's name for the three modules for the

selector (bit);
hashnumber;
index (starting at 0);
limb (left aligned);
nb of bytes to extract from the limb

RLP_TXN :
- TO_HASH_BY_PROVER ;
- ABS_TX_NUM (between 1 and MAX_NB_OF_TX_PER_BATCH, currently 200, without gap);
- INDEX_X;
- LIMB;
- nBYTES
RLP_ADDR :
- LC ;
- STAMP (between 1 and 256³, without gap);
- INDEX;
- LIMB;
- nBYTES
SHAKIRA :
- IS_KECCAK_DATA
- RIPSHA_STAMP (between 1 and 256⁴, with gaps)
- INDEX
- LIMB
- nBYTES

letypequividelespoubelles commented 6 months ago

@Soleimani193 For RLP_ADDR the hash is stored in RAW_ADDR_HI and DEP_ADDR_LO for high and low part of the hash. WARNING: KEC_HI and KEC_LO is the hash of the initialisation code for CREATE2 and NOT the hash of the input we give you.

OlivierBBB commented 6 months ago

Note. This has become issue #189.

For the results @Soleimani193 requires the same format everywhere for our RES prediction. This only affects data coming from SHAKIRA and RLPADDR. The way data is represented in both modules is different:

SHAKIRA:

results are in a single column
results aren't duplicated

RLPADDR:

hi / lo parts of results live in separate columns
results are duplicated

It would be nice to provide her with selectors to produce (internal) duplicate free RES_HI / RES_LO columns. This is simple to do and would for instance take on the following form.

SHAKIRA:
- KECCAK_RES_HI  =  IS_KECCAK_RESULT  *  (1 - INDEX)  *  LIMB
- KECCAK_RES_LO  =  IS_KECCAK_RESULT  *  INDEX        *  LIMB

RLPADDR:
- KECCAK_RES_HI  =  RAW_ADDR_HI  *  IS_FIRST_ROW
- KECCAK_RES_LO  =  DEP_ADDR_LO  *  IS_FIRST_ROW

where IS_FIRST_ROW is binary and = 1 iff the RLPADDR stamp just jumped by 1 (and 0 on the first row) so e.g.

If RLPADDR_STAMP  =  0  Then  IS_FIRST_ROW  =  0
If RLPADDR_STAMP  ≠  0  Then  IS_FIRST_ROW  =  RLPADDR_STAMP[i]  -  RLPADDR_STAMP[i  - 1]

Notes.

@Soleimani193 confirmed to me that the fact that the SHAKIRA result columns would contain results that live on different rows is irrelevant to her
"(Internal) duplicate free" means that there is no duplication for the same ID. This is due to this part of the "data aggregation" being done with a projection mapping rather than a lookup.

Soleimani193 commented 6 months ago

Here are the following column's name for the three modules for the

selector (bit);

hashnumber;

index (starting at 0);

limb (left aligned);

nb of bytes to extract from the limb

RLP_TXN :

TO_HASH_BY_PROVER ;

ABS_TX_NUM (between 1 and MAX_NB_OF_TX_PER_BATCH, currently 200, without gap);

INDEX_X;

LIMB;

nBYTES

RLP_ADDR :

LC ;

STAMP (between 1 and 256³, without gap);

INDEX;

LIMB;

nBYTES

SHAKIRA :

IS_KECCAK_DATA

RIPSHA_STAMP (between 1 and 256⁴, with gaps)

INDEX

LIMB

nBYTES

@letypequividelespoubelles can you also add the columns for hashHi, hashLow, selectorHashHi, selectorHashLow ? note that the sector should not have duplication over the same hashNum.

letypequividelespoubelles commented 6 months ago

Consensys / linea-constraints

Arithmetization <=> Prover, Data Transfer for Hash #137