Normalization of addresses

filecoin-project / lily

capturing on-chain state for the filecoin network

Other

49 stars 45 forks source link

Normalization of addresses #212

Open hsanjuan opened 4 years ago

hsanjuan commented 4 years ago

Description

We suspect that, since Lotus uses robust addresses and actor IDs interchangeably, we are ingesting data with both and not accounting for the fact when making queries. i.e. the "To" field in messages could appear as f0xxxx and f3xxx for the same entity but we would not be able to identify that.

Thus, we propose to normalize addresses to robust addresses where both have risk of being used interchangeably:

Message fields

Options

Option 1: Normalize addresses on processing and reprocess the tables
Option 2: Normalize addresses on processing and insert them in a new column, keeping the original information. Needs migration before re-processing.
Option 3: Populate a table to translate addresses.

Acceptance criteria

Ensure ensure message addresses are resolved during ingestion.
Trigger re-processing of messages.

Where to begin

Find the best way to convert an ID to a robust address
Modify messages task to normalize addresses based on the result of the discussion

olizilla commented 3 years ago

This is blocking tracking down relationships between entities for analysis... given a list of ID addresses, we cannot easily see if they interact, as the messages table from column lists the robust address (the long one), while the to column lists the ID address (the short one).

Any solution to this would be acceptable at this stage; this is a blocker.

willscott commented 3 years ago

here's the mapping at a recent stateroot if useful (cc @alanshaw ) https://node.glif.io/service/statediff/rpc/cid?cid=bafy2bzaceamknubogebmq33kdt5ad5ioat24mhbkrieziavrwkm5tlzp46b44&as=initActorAddresses

placer14 commented 3 years ago

For future readers: Magik6k provided this magic sequence: lotus chain get --as-type hamt-address /pstate/1/@Ha:t01/1/0 | parallel './lotus state lookup {} | grep f0... && echo {}'

Magik6k:

This is just listing all addresses in the init actor key/robust address map, resolving to ID, then looking for the one we want.

🤝

placer14 commented 3 years ago

Additionally, if you were only interested in Account Actors, you could also do lotus state lookup -r f0...

olizilla commented 3 years ago

Thanks! For posterity i ended up using statediff via glif.io (thanks @willscott) and gron'n'grep to find the magic beans

$ gron "https://node.glif.io/service/statediff/rpc/cid?cid=<state root cid>&as=initActorAddresses" | grep -E '<id 1>|<id 2>'
json.f2<robust address 1> = <id 1>;
json.f2<robust address 2> = <id 2>;

using the lotus chain'n'parallel incantation ran for minutes and I had to abort it.

iand commented 3 years ago

Copied this to https://github.com/filecoin-project/sentinel/issues/176 as an example of the kind of extra data normalisation we need to do in a derived schema