Consensys / linea-constraints

Implementation of the constraint system of Linea, specified in the linea-specification repo.
Other
13 stars 5 forks source link

Arithmetization <=> Prover, Data Transfer for Hash #137

Closed Soleimani193 closed 6 months ago

Soleimani193 commented 8 months ago

Description: Hash computation on Linea is done as the following;

There are three sources of data on the arithmetization side:

Note. ROM isn't on the list as any bytecode that gets deployed (even temporarily) finds itself duplicated in the SHAKIRA module.

All these modules provide relevant columns as:

See the details here.

The flag provided for each module is different and does not have the same structure:

Particularly the Flag for TX-RLP is not directly given to the prover and the prover has to build the flag itself.

The issue with this is that the prover has to handle the arithmetization modules differently and also should know the structure of LX , LC and apply the constraints asserting to the relation flag := LC *LX

Solution: the flag should be given directly by the arithmetization team as a single binary column.

letypequividelespoubelles commented 8 months ago

@Soleimani193 We don't provide you with a HASH_NUMBER column at least for RLP_ADDR and RLP_TXN.

No issue for me to merge the flag column, so you'll have, for 3 modules:

If ok for you @OlivierBBB I do the modif for rlp* module, you do for SHAKIRA ?

OlivierBBB commented 8 months ago

Hi @Soleimani193. We just discussed this with @letypequividelespoubelles. The conclusion we reached is as follows:

Note. Adding the appropriate shift should happen in the "lookup" from the Arithmetization module to the "KECCAK prover module".

Also one thing that is currently missing from the picture are the outputs. The arithmetization requires

letypequividelespoubelles commented 8 months ago

Also @Soleimani193 the column for the unique ID of the hash has different name for the different module. If it's easier for you, we can make it the same name for every module (HASH_NUMBER) and avoid you to add a shift in the lookups.

Soleimani193 commented 7 months ago

Also @Soleimani193 the column for the unique ID of the hash has different name for the different module. If it's easier for you, we can make it the same name for every module (HASH_NUMBER) and avoid you adding a shift in the lookups.

@letypequividelespoubelles Yes it is more convenient without a shift. Also, what pattern may I see over the HASH_NUMBER column? can I say the Hash_Num extracted by flag or remains fixed or increases? For example, after applying the flag TO_HASH_BY_PROVER over HASH_NUM can the result be something like (1 , 1, 1, 3 , 7, 7, 7, 7, 7, 20 , 20 )

Soleimani193 commented 7 months ago

Hi @Soleimani193. We just discussed this with @letypequividelespoubelles. The conclusion we reached is as follows:

  • we will include a IS_KECCAK_DATA column in the RLP_TXN and RLP_ADDR modules
  • there is the potential for conflicts in the HASH_NUM between these data sources; as such we must introduce moderately large integers e.g.

    • RLP_TXN_shift = 0
    • RLP_ADDR_shift = 256^2
    • SHAKIRA_shift = 2 * 256^2
  • so that what the Prover hash considers the ID is HASH_NUM + (appropriate shift)

Note. Adding the appropriate shift should happen in the "lookup" from the Arithmetization module to the "KECCAK prover module".

Also one thing that is currently missing from the picture are the outputs. The arithmetization requires

  • the truncated hash for RLP_ADDR (last 20 bytes)
  • the full hash for SHAKIRA

@OlivierBBB what we output is the pure output of hash. Thus proving that the pure output of the hash is correctly truncated into 20 last byte is on your side.

This means you should have a column for the output of the hash and this is the column you should give us.

letypequividelespoubelles commented 7 months ago

Here are the following column's name for the three modules for the

  1. RLP_TXN :

    • TO_HASH_BY_PROVER ;
    • ABS_TX_NUM (between 1 and MAX_NB_OF_TX_PER_BATCH, currently 200, without gap);
    • INDEX_X;
    • LIMB;
    • nBYTES
  2. RLP_ADDR :

    • LC ;
    • STAMP (between 1 and 256³, without gap);
    • INDEX;
    • LIMB;
    • nBYTES
  3. SHAKIRA :

    • IS_KECCAK_DATA
    • RIPSHA_STAMP (between 1 and 256⁴, with gaps)
    • INDEX
    • LIMB
    • nBYTES
letypequividelespoubelles commented 6 months ago

@Soleimani193 For RLP_ADDR the hash is stored in RAW_ADDR_HI and DEP_ADDR_LO for high and low part of the hash. WARNING: KEC_HI and KEC_LO is the hash of the initialisation code for CREATE2 and NOT the hash of the input we give you.

OlivierBBB commented 6 months ago

Note. This has become issue #189.

For the results @Soleimani193 requires the same format everywhere for our RES prediction. This only affects data coming from SHAKIRA and RLPADDR. The way data is represented in both modules is different:

SHAKIRA:

RLPADDR:

It would be nice to provide her with selectors to produce (internal) duplicate free RES_HI / RES_LO columns. This is simple to do and would for instance take on the following form.

SHAKIRA:
- KECCAK_RES_HI  =  IS_KECCAK_RESULT  *  (1 - INDEX)  *  LIMB
- KECCAK_RES_LO  =  IS_KECCAK_RESULT  *  INDEX        *  LIMB

RLPADDR:
- KECCAK_RES_HI  =  RAW_ADDR_HI  *  IS_FIRST_ROW
- KECCAK_RES_LO  =  DEP_ADDR_LO  *  IS_FIRST_ROW

where IS_FIRST_ROW is binary and = 1 iff the RLPADDR stamp just jumped by 1 (and 0 on the first row) so e.g.

If RLPADDR_STAMP  =  0  Then  IS_FIRST_ROW  =  0
If RLPADDR_STAMP  ≠  0  Then  IS_FIRST_ROW  =  RLPADDR_STAMP[i]  -  RLPADDR_STAMP[i  - 1]

Notes.

Soleimani193 commented 6 months ago

Here are the following column's name for the three modules for the

  • selector (bit);
  • hashnumber;
  • index (starting at 0);
  • limb (left aligned);
  • nb of bytes to extract from the limb
  1. RLP_TXN :
  • TO_HASH_BY_PROVER ;
  • ABS_TX_NUM (between 1 and MAX_NB_OF_TX_PER_BATCH, currently 200, without gap);
  • INDEX_X;
  • LIMB;
  • nBYTES
  1. RLP_ADDR :
  • LC ;
  • STAMP (between 1 and 256³, without gap);
  • INDEX;
  • LIMB;
  • nBYTES
  1. SHAKIRA :
  • IS_KECCAK_DATA
  • RIPSHA_STAMP (between 1 and 256⁴, with gaps)
  • INDEX
  • LIMB
  • nBYTES

@letypequividelespoubelles can you also add the columns for hashHi, hashLow, selectorHashHi, selectorHashLow ? note that the sector should not have duplication over the same hashNum.

letypequividelespoubelles commented 6 months ago

image