filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.85k stars 1.27k forks source link

Filecoin state profiling #10884

Closed jennijuju closed 1 year ago

jennijuju commented 1 year ago

edited by @ZenGround0

User Story

I am a user who develops the filecoin protocol or software involved in running or using the protocol.

1) I want to understand what is happening with filecoin state. I want to see the current breakdown of byte usage by the protocol to understand if there are dangerous trends that need to be addressed and get ideas for how to fix them.

2) I also want to be able to go back and analyze how previous changes have impacted snapshot size / protocol data usage.

3) I can visualize how new changes I have developed impact snapshot size / protocol data usage.

Acceptance Criteria

I can run a command or a small set of commands that gives me two things provided with a filecoin snapshot:

1) data on the breakdown of state usage of a chain snapshot broken down into a) state tree b) message / headers c) state churn. Further breakdown of state and churn by i) actor type ii) actor type field and further breakdown of messages by message type

2) a visualization of this data for quick inspection. This visualization will be a DiskInventoryX style treemap breakdown of the above data. image

#### Deliverables
- [ ] These outputs will be judged against a first motivating task: we will use them to explain exactly how the nv19 protocol upgrade reduced the snapshot size by ~50%. https://github.com/filecoin-project/lotus/issues/11035
- [x] run this process in our own infra on a regular basis to monitor the network's condition. (https://github.com/filecoin-project/lotus/issues/11022)
- [x] Document how to use this tool (shipped at https://github.com/filecoin-project/lotus/discussions/11037)

Techincal Breakdowns

- [x] Import snapshot into something that allows for analaysis (current plan is to use lotus)
- [x] Debug and repurpose existing `lotus-shed stateroot stat` as needed
- [x] Develop similar functionalities for whole snapshot churn / messages / headers
- [x] Experiment with and decide on level of granularity, likely breaking actor types down into top level fields
- [x] Use a data output format that is good for people and plotting programs (probably json)
- [x] Plotting program: probably using [this tool]*(https://plotly.com/python/treemaps/)

Simplifying assumptions (we can revisit these)

Bonus Cake visualization suggestion from andy: https://askubuntu.com/questions/73160/how-do-i-find-the-amount-of-free-space-on-my-hard-drive. This is a good possible alternative to the python based treemap

ZenGround0 commented 1 year ago

Spec

Concretely Im working on a tool that can take a snapshot and output information on the total snapshot size. This is mainly different from #9793 in that it accounts for message data and churn data. The tool will report:

1) total bytes in messages, headers, state trees 2) for state tree a breakdown by top level actor field of the size of the set equal to the union of all data blocks belonging to that field 3) For the purposes of this investigation I'll focus in on the market actor to begin with since we suspect it is the major contributor

With this information we can diagnose what went so right in nv19 and how to preemptively avoid the same problem happening in the future in other parts of the state tree.

Implementation

Rough sketch 1) Read CAR snapshots into a badger database so we can do random reads 2) Use / implement function for count size of subgraph 3) Traverse chain, count bytes of headers, count bytes of messages 4) When counting bytes of state tree traverse all actors in the same way as #9793. I'll start by doing this with market actor, others I'll just count at the top level. a) Track every cid in a big set, if already seen don't traverse and count state size b) If this is too much memory revisit and do something less simple, probably a badger datastore for the CID set.

raulk commented 1 year ago

Take a look at this: https://pkg.go.dev/github.com/ipld/go-car/v2#Reader.Inspect

It may deliver some useful top-level insights.

jennijuju commented 1 year ago

@ZenGround0 im curious about

total bytes in messages, headers, state trees

what do you mean by headers here?

more concretely, id like to have this tooling be able to tell me how big each field. For example, the total size of sectors in miner actors.We can then get the number of sectors we have & how much each filed of sectoronchaininfo is contributing to it. With this, we will have more visibility into the impact on work like https://github.com/filecoin-project/FIPs/discussions/546 before spending time on it

arajasek commented 1 year ago

End goals:

arajasek commented 1 year ago

TODO: Split into separate issues for building the tools and answering the nv19 enigma, vs. setting up infra (which will be a broader task).

arajasek commented 1 year ago

Idea: It'd be nice, but lowest priority, if this tooling or the tooling in could be easily invoked with the result of an arbitrary state migration func, so that we can assess the impact of a migration (whether it reduces or horribly increases state).

jennijuju commented 1 year ago

TODO: Split into separate issues for building the tools and answering the nv19 enigma, vs. setting up infra (which will be a broader task).

https://github.com/filecoin-project/lotus/issues/10981

jennijuju commented 1 year ago

@snissn, base on today's feedback - moved the deployment from #10981 to here. Can you add the deployment recipe?

snissn commented 1 year ago

@jennijuju can you confirm the requirement for the deployment? I don't see it on the referenced ticket. I understand it to be something like this:

I think that we need to do more work to coordinate on more details of how that should look. Some quick questions off the top of my head:

Are there outputs that go into grafana? Is there a web dashboard to view the latest tree graph? Is there a threshold that triggers an alert and where does that alert go? Is the server hosted on aws? Does lotus have a centralized monitoring system that can detect if this node goes down?

jennijuju commented 1 year ago

can you confirm the requirement for the deployment?

That's what I'm asking you!

deploy a tool that monitors the state data usage of the Filecoin data via analysis over an exported snapshot

lgtm as a one-liner.

Some quick questions off the top of my head:

I think the ask is for you to come up with the initial proposal then we can review tgt

jennijuju commented 1 year ago

Does lotus have a centralized monitoring system that can detect if this node goes down?

fil-infra / #fil-sentinel has something built - might be worth asking them. but maybe we can put this into https://github.com/filecoin-project/lotus/issues/10981

so for this one, maybe simply say, we can manually run the script and have data reported to [ ] & build a view?

snissn commented 1 year ago

yeah! It might make sense to split infra out of this recipe. I think we want to make a deliverable for this as simple as "opening a ticket to deploy the features created here"

jennijuju commented 1 year ago

yeah! It might make sense to split infra out of this recipe. I think we want to make a deliverable for this as simple as "opening a ticket to deploy the features created here"

fine by me! (but lets make that ticket to be a part of https://github.com/filecoin-project/lotus/issues/10981, like node setup?

jennijuju commented 1 year ago

@ZenGround0 like you have mentioned in #11035 , we have leaned that the snapshot size reduction across nv19 was due to market actor state's AMT.

Could you please attach the output json for before and after nv19, then we can close this issue. (we will continue to track the followups in their issues accordingly.

ZenGround0 commented 1 year ago

Before nv19:

{"/":{"Size":0,"Links":0},"/headers":{"Size":14189794564,"Links":103849005},"/messages":{"Size":264797438,"Links":321220},"/statetree":{"Size":0,"Links":0},"/statetree/churn":{"Size":2574560355,"Links":980422},"/statetree/churn/account":{"Size":0,"Links":0},"/statetree/churn/cron":{"Size":0,"Links":0},"/statetree/churn/datacap":{"Size":31557668,"Links":33699},"/statetree/churn/ethaccount":{"Size":0,"Links":0},"/statetree/churn/evm":{"Size":107133,"Links":871},"/statetree/churn/evm/Bytecode":{"Size":0,"Links":0},"/statetree/churn/evm/ContractState":{"Size":7557630,"Links":8091},"/statetree/churn/init":{"Size":21831,"Links":383},"/statetree/churn/init/AddressMap":{"Size":2590067,"Links":1910},"/statetree/churn/multisig":{"Size":46754,"Links":629},"/statetree/churn/multisig/PendingTxns":{"Size":35409,"Links":126},"/statetree/churn/paymentchannel":{"Size":0,"Links":0},"/statetree/churn/paymentchannel/LaneStates":{"Size":0,"Links":0},"/statetree/churn/reward":{"Size":315900,"Links":1950},"/statetree/churn/storagemarket":{"Size":655200,"Links":1950},"/statetree/churn/storagemarket/DealOpsByEpoch":{"Size":298371781,"Links":352017},"/statetree/churn/storagemarket/EscrowTable":{"Size":101573496,"Links":96991},"/statetree/churn/storagemarket/LockedTable":{"Size":40255422,"Links":45487},"/statetree/churn/storagemarket/PendingDealAllocationIds":{"Size":212559273,"Links":262131},"/statetree/churn/storagemarket/PendingProposals":{"Size":462811412,"Links":369729},"/statetree/churn/storagemarket/Proposals":{"Size":34023521,"Links":26514},"/statetree/churn/storagemarket/States":{"Size":58294966330,"Links":34111767},"/statetree/churn/storageminer":{"Size":88544809,"Links":262866},"/statetree/churn/storageminer/AllocatedSectors":{"Size":17818346,"Links":28490},"/statetree/churn/storageminer/Deadlines":{"Size":550318857,"Links":724509},"/statetree/churn/storageminer/Info":{"Size":10233,"Links":102},"/statetree/churn/storageminer/PreCommittedSectors":{"Size":320503976,"Links":147430},"/statetree/churn/storageminer/PreCommittedSectorsCleanUp":{"Size":45677757,"Links":133632},"/statetree/churn/storageminer/Sectors":{"Size":193540743,"Links":149018},"/statetree/churn/storageminer/VestingFunds":{"Size":76724916,"Links":13551},"/statetree/churn/storagepower":{"Size":473850,"Links":1950},"/statetree/churn/storagepower/Claims":{"Size":12928990,"Links":12376},"/statetree/churn/storagepower/CronEventQueue":{"Size":5555340,"Links":2146},"/statetree/churn/system":{"Size":0,"Links":0},"/statetree/churn/system/BuiltinActors":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry":{"Size":333221,"Links":1841},"/statetree/churn/verifiedregistry/Allocations":{"Size":457912092,"Links":236836},"/statetree/churn/verifiedregistry/Claims":{"Size":378921602,"Links":202528},"/statetree/churn/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry/Verifiers":{"Size":7848,"Links":12},"/statetree/latest":{"Size":231056432,"Links":191704},"/statetree/latest/account":{"Size":35533953,"Links":1467200},"/statetree/latest/cron":{"Size":12,"Links":1},"/statetree/latest/datacap":{"Size":191058,"Links":880},"/statetree/latest/eam":{"Size":0,"Links":0},"/statetree/latest/ethaccount":{"Size":1,"Links":1},"/statetree/latest/evm":{"Size":124879,"Links":1014},"/statetree/latest/evm/Bytecode":{"Size":6136270,"Links":757},"/statetree/latest/evm/ContractState":{"Size":21839720,"Links":111352},"/statetree/latest/init":{"Size":57,"Links":1},"/statetree/latest/init/AddressMap":{"Size":72803412,"Links":192096},"/statetree/latest/multisig":{"Size":758481,"Links":10656},"/statetree/latest/multisig/PendingTxns":{"Size":236204,"Links":1030},"/statetree/latest/paymentchannel":{"Size":264784,"Links":4697},"/statetree/latest/paymentchannel/LaneStates":{"Size":12308,"Links":179},"/statetree/latest/placeholder":{"Size":0,"Links":0},"/statetree/latest/reward":{"Size":162,"Links":1},"/statetree/latest/storagemarket":{"Size":336,"Links":1},"/statetree/latest/storagemarket/DealOpsByEpoch":{"Size":390104913,"Links":3107661},"/statetree/latest/storagemarket/EscrowTable":{"Size":91725,"Links":237},"/statetree/latest/storagemarket/LockedTable":{"Size":74697,"Links":307},"/statetree/latest/storagemarket/PendingDealAllocationIds":{"Size":1635281,"Links":11221},"/statetree/latest/storagemarket/PendingProposals":{"Size":38729060,"Links":43930},"/statetree/latest/storagemarket/Proposals":{"Size":4245651798,"Links":1007503},"/statetree/latest/storagemarket/States":{"Size":474965189,"Links":497054},"/statetree/latest/storageminer":{"Size":98838107,"Links":313978},"/statetree/latest/storageminer/AllocatedSectors":{"Size":8655608,"Links":6759},"/statetree/latest/storageminer/Deadlines":{"Size":705660654,"Links":3153636},"/statetree/latest/storageminer/Info":{"Size":5324209,"Links":67206},"/statetree/latest/storageminer/PreCommittedSectors":{"Size":4583519,"Links":3803},"/statetree/latest/storageminer/PreCommittedSectorsCleanUp":{"Size":17208536,"Links":249150},"/statetree/latest/storageminer/Sectors":{"Size":57337155988,"Links":20388903},"/statetree/latest/storageminer/VestingFunds":{"Size":21552723,"Links":4154},"/statetree/latest/storagepower":{"Size":243,"Links":1},"/statetree/latest/storagepower/Claims":{"Size":8318580,"Links":36659},"/statetree/latest/storagepower/CronEventQueue":{"Size":47946,"Links":158},"/statetree/latest/system":{"Size":44,"Links":1},"/statetree/latest/system/BuiltinActors":{"Size":7622355,"Links":17},"/statetree/latest/verifiedregistry":{"Size":181,"Links":1},"/statetree/latest/verifiedregistry/Allocations":{"Size":39007416,"Links":45480},"/statetree/latest/verifiedregistry/Claims":{"Size":1478667284,"Links":1347045},"/statetree/latest/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/latest/verifiedregistry/Verifiers":{"Size":1753,"Links":11}}

After nv19

{"/":{"Size":0,"Links":0},"/headers":{"Size":15128123760,"Links":110736984},"/messages":{"Size":331485128,"Links":343687},"/statetree":{"Size":0,"Links":0},"/statetree/churn":{"Size":2795170560,"Links":1053576},"/statetree/churn/account":{"Size":0,"Links":0},"/statetree/churn/cron":{"Size":0,"Links":0},"/statetree/churn/datacap":{"Size":53040057,"Links":53357},"/statetree/churn/ethaccount":{"Size":0,"Links":0},"/statetree/churn/evm":{"Size":269986,"Links":2195},"/statetree/churn/evm/Bytecode":{"Size":0,"Links":0},"/statetree/churn/evm/ContractState":{"Size":47414724,"Links":64716},"/statetree/churn/init":{"Size":39900,"Links":700},"/statetree/churn/init/AddressMap":{"Size":5340929,"Links":3894},"/statetree/churn/multisig":{"Size":81023,"Links":1095},"/statetree/churn/multisig/PendingTxns":{"Size":64520,"Links":377},"/statetree/churn/paymentchannel":{"Size":0,"Links":0},"/statetree/churn/paymentchannel/LaneStates":{"Size":0,"Links":0},"/statetree/churn/reward":{"Size":326360,"Links":1990},"/statetree/churn/storagemarket":{"Size":668640,"Links":1990},"/statetree/churn/storagemarket/DealOpsByEpoch":{"Size":461097869,"Links":454394},"/statetree/churn/storagemarket/EscrowTable":{"Size":36678778,"Links":29914},"/statetree/churn/storagemarket/LockedTable":{"Size":32018113,"Links":33925},"/statetree/churn/storagemarket/PendingDealAllocationIds":{"Size":379375572,"Links":463297},"/statetree/churn/storagemarket/PendingProposals":{"Size":843571493,"Links":579877},"/statetree/churn/storagemarket/Proposals":{"Size":98154560,"Links":65768},"/statetree/churn/storagemarket/States":{"Size":4224467329,"Links":2175384},"/statetree/churn/storageminer":{"Size":91896896,"Links":272557},"/statetree/churn/storageminer/AllocatedSectors":{"Size":16966390,"Links":26070},"/statetree/churn/storageminer/Deadlines":{"Size":597407300,"Links":831668},"/statetree/churn/storageminer/Info":{"Size":5847,"Links":60},"/statetree/churn/storageminer/PreCommittedSectors":{"Size":435120386,"Links":183201},"/statetree/churn/storageminer/PreCommittedSectorsCleanUp":{"Size":70378563,"Links":120397},"/statetree/churn/storageminer/Sectors":{"Size":222871525,"Links":199792},"/statetree/churn/storageminer/VestingFunds":{"Size":87627812,"Links":15461},"/statetree/churn/storagepower":{"Size":483570,"Links":1990},"/statetree/churn/storagepower/Claims":{"Size":16842494,"Links":16153},"/statetree/churn/storagepower/CronEventQueue":{"Size":5735561,"Links":2123},"/statetree/churn/system":{"Size":0,"Links":0},"/statetree/churn/system/BuiltinActors":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry":{"Size":359647,"Links":1987},"/statetree/churn/verifiedregistry/Allocations":{"Size":665896042,"Links":362628},"/statetree/churn/verifiedregistry/Claims":{"Size":537375652,"Links":292083},"/statetree/churn/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry/Verifiers":{"Size":2100,"Links":4},"/statetree/latest":{"Size":242065632,"Links":208164},"/statetree/latest/account":{"Size":36450951,"Links":1505042},"/statetree/latest/cron":{"Size":12,"Links":1},"/statetree/latest/datacap":{"Size":205300,"Links":1035},"/statetree/latest/eam":{"Size":0,"Links":0},"/statetree/latest/ethaccount":{"Size":1,"Links":1},"/statetree/latest/evm":{"Size":279361,"Links":2269},"/statetree/latest/evm/Bytecode":{"Size":11644911,"Links":1237},"/statetree/latest/evm/ContractState":{"Size":64404174,"Links":345051},"/statetree/latest/init":{"Size":57,"Links":1},"/statetree/latest/init/AddressMap":{"Size":76282146,"Links":208877},"/statetree/latest/multisig":{"Size":787587,"Links":11070},"/statetree/latest/multisig/PendingTxns":{"Size":339661,"Links":1076},"/statetree/latest/paymentchannel":{"Size":264958,"Links":4700},"/statetree/latest/paymentchannel/LaneStates":{"Size":12299,"Links":178},"/statetree/latest/placeholder":{"Size":0,"Links":0},"/statetree/latest/reward":{"Size":164,"Links":1},"/statetree/latest/storagemarket":{"Size":336,"Links":1},"/statetree/latest/storagemarket/DealOpsByEpoch":{"Size":453704629,"Links":3006011},"/statetree/latest/storagemarket/EscrowTable":{"Size":97550,"Links":264},"/statetree/latest/storagemarket/LockedTable":{"Size":81300,"Links":348},"/statetree/latest/storagemarket/PendingDealAllocationIds":{"Size":4216469,"Links":30995},"/statetree/latest/storagemarket/PendingProposals":{"Size":171913122,"Links":475075},"/statetree/latest/storagemarket/Proposals":{"Size":5536691212,"Links":1302165},"/statetree/latest/storagemarket/States":{"Size":622708452,"Links":642194},"/statetree/latest/storageminer":{"Size":99320640,"Links":315518},"/statetree/latest/storageminer/AllocatedSectors":{"Size":8776044,"Links":6944},"/statetree/latest/storageminer/Deadlines":{"Size":735375198,"Links":2987852},"/statetree/latest/storageminer/Info":{"Size":5359425,"Links":67580},"/statetree/latest/storageminer/PreCommittedSectors":{"Size":6523712,"Links":5065},"/statetree/latest/storageminer/PreCommittedSectorsCleanUp":{"Size":17024374,"Links":238709},"/statetree/latest/storageminer/Sectors":{"Size":58355319120,"Links":20690405},"/statetree/latest/storageminer/VestingFunds":{"Size":21119652,"Links":4120},"/statetree/latest/storagepower":{"Size":243,"Links":1},"/statetree/latest/storagepower/Claims":{"Size":8338825,"Links":36688},"/statetree/latest/storagepower/CronEventQueue":{"Size":47247,"Links":154},"/statetree/latest/system":{"Size":44,"Links":1},"/statetree/latest/system/BuiltinActors":{"Size":7538245,"Links":17},"/statetree/latest/verifiedregistry":{"Size":181,"Links":1},"/statetree/latest/verifiedregistry/Allocations":{"Size":64351280,"Links":75301},"/statetree/latest/verifiedregistry/Claims":{"Size":2338159840,"Links":2304713},"/statetree/latest/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/latest/verifiedregistry/Verifiers":{"Size":1781,"Links":11}}
ZenGround0 commented 1 year ago

Trail of outputs: