Closed jennijuju closed 1 year ago
Concretely Im working on a tool that can take a snapshot and output information on the total snapshot size. This is mainly different from #9793 in that it accounts for message data and churn data. The tool will report:
1) total bytes in messages, headers, state trees 2) for state tree a breakdown by top level actor field of the size of the set equal to the union of all data blocks belonging to that field 3) For the purposes of this investigation I'll focus in on the market actor to begin with since we suspect it is the major contributor
With this information we can diagnose what went so right in nv19 and how to preemptively avoid the same problem happening in the future in other parts of the state tree.
Rough sketch 1) Read CAR snapshots into a badger database so we can do random reads 2) Use / implement function for count size of subgraph 3) Traverse chain, count bytes of headers, count bytes of messages 4) When counting bytes of state tree traverse all actors in the same way as #9793. I'll start by doing this with market actor, others I'll just count at the top level. a) Track every cid in a big set, if already seen don't traverse and count state size b) If this is too much memory revisit and do something less simple, probably a badger datastore for the CID set.
Take a look at this: https://pkg.go.dev/github.com/ipld/go-car/v2#Reader.Inspect
It may deliver some useful top-level insights.
@ZenGround0 im curious about
total bytes in messages, headers, state trees
what do you mean by headers here?
more concretely, id like to have this tooling be able to tell me how big each field. For example, the total size of sectors
in miner actors.We can then get the number of sectors we have & how much each filed of sectoronchaininfo is contributing to it. With this, we will have more visibility into the impact on work like https://github.com/filecoin-project/FIPs/discussions/546 before spending time on it
End goals:
TODO: Split into separate issues for building the tools and answering the nv19 enigma, vs. setting up infra (which will be a broader task).
Idea: It'd be nice, but lowest priority, if this tooling or the tooling in
TODO: Split into separate issues for building the tools and answering the nv19 enigma, vs. setting up infra (which will be a broader task).
@snissn, base on today's feedback - moved the deployment from #10981 to here. Can you add the deployment
recipe?
@jennijuju can you confirm the requirement for the deployment? I don't see it on the referenced ticket. I understand it to be something like this:
I think that we need to do more work to coordinate on more details of how that should look. Some quick questions off the top of my head:
Are there outputs that go into grafana? Is there a web dashboard to view the latest tree graph? Is there a threshold that triggers an alert and where does that alert go? Is the server hosted on aws? Does lotus have a centralized monitoring system that can detect if this node goes down?
can you confirm the requirement for the deployment?
That's what I'm asking you!
deploy a tool that monitors the state data usage of the Filecoin data via analysis over an exported snapshot
lgtm as a one-liner.
Some quick questions off the top of my head:
I think the ask is for you to come up with the initial proposal then we can review tgt
Does lotus have a centralized monitoring system that can detect if this node goes down?
so for this one, maybe simply say, we can manually run the script and have data reported to [ ] & build a view?
yeah! It might make sense to split infra out of this recipe. I think we want to make a deliverable for this as simple as "opening a ticket to deploy the features created here"
yeah! It might make sense to split infra out of this recipe. I think we want to make a deliverable for this as simple as "opening a ticket to deploy the features created here"
fine by me! (but lets make that ticket to be a part of https://github.com/filecoin-project/lotus/issues/10981, like node setup?
@ZenGround0 like you have mentioned in #11035 , we have leaned that the snapshot size reduction across nv19 was due to market actor state's AMT.
Could you please attach the output json for before and after nv19, then we can close this issue. (we will continue to track the followups in their issues accordingly.
Before nv19:
{"/":{"Size":0,"Links":0},"/headers":{"Size":14189794564,"Links":103849005},"/messages":{"Size":264797438,"Links":321220},"/statetree":{"Size":0,"Links":0},"/statetree/churn":{"Size":2574560355,"Links":980422},"/statetree/churn/account":{"Size":0,"Links":0},"/statetree/churn/cron":{"Size":0,"Links":0},"/statetree/churn/datacap":{"Size":31557668,"Links":33699},"/statetree/churn/ethaccount":{"Size":0,"Links":0},"/statetree/churn/evm":{"Size":107133,"Links":871},"/statetree/churn/evm/Bytecode":{"Size":0,"Links":0},"/statetree/churn/evm/ContractState":{"Size":7557630,"Links":8091},"/statetree/churn/init":{"Size":21831,"Links":383},"/statetree/churn/init/AddressMap":{"Size":2590067,"Links":1910},"/statetree/churn/multisig":{"Size":46754,"Links":629},"/statetree/churn/multisig/PendingTxns":{"Size":35409,"Links":126},"/statetree/churn/paymentchannel":{"Size":0,"Links":0},"/statetree/churn/paymentchannel/LaneStates":{"Size":0,"Links":0},"/statetree/churn/reward":{"Size":315900,"Links":1950},"/statetree/churn/storagemarket":{"Size":655200,"Links":1950},"/statetree/churn/storagemarket/DealOpsByEpoch":{"Size":298371781,"Links":352017},"/statetree/churn/storagemarket/EscrowTable":{"Size":101573496,"Links":96991},"/statetree/churn/storagemarket/LockedTable":{"Size":40255422,"Links":45487},"/statetree/churn/storagemarket/PendingDealAllocationIds":{"Size":212559273,"Links":262131},"/statetree/churn/storagemarket/PendingProposals":{"Size":462811412,"Links":369729},"/statetree/churn/storagemarket/Proposals":{"Size":34023521,"Links":26514},"/statetree/churn/storagemarket/States":{"Size":58294966330,"Links":34111767},"/statetree/churn/storageminer":{"Size":88544809,"Links":262866},"/statetree/churn/storageminer/AllocatedSectors":{"Size":17818346,"Links":28490},"/statetree/churn/storageminer/Deadlines":{"Size":550318857,"Links":724509},"/statetree/churn/storageminer/Info":{"Size":10233,"Links":102},"/statetree/churn/storageminer/PreCommittedSectors":{"Size":320503976,"Links":147430},"/statetree/churn/storageminer/PreCommittedSectorsCleanUp":{"Size":45677757,"Links":133632},"/statetree/churn/storageminer/Sectors":{"Size":193540743,"Links":149018},"/statetree/churn/storageminer/VestingFunds":{"Size":76724916,"Links":13551},"/statetree/churn/storagepower":{"Size":473850,"Links":1950},"/statetree/churn/storagepower/Claims":{"Size":12928990,"Links":12376},"/statetree/churn/storagepower/CronEventQueue":{"Size":5555340,"Links":2146},"/statetree/churn/system":{"Size":0,"Links":0},"/statetree/churn/system/BuiltinActors":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry":{"Size":333221,"Links":1841},"/statetree/churn/verifiedregistry/Allocations":{"Size":457912092,"Links":236836},"/statetree/churn/verifiedregistry/Claims":{"Size":378921602,"Links":202528},"/statetree/churn/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry/Verifiers":{"Size":7848,"Links":12},"/statetree/latest":{"Size":231056432,"Links":191704},"/statetree/latest/account":{"Size":35533953,"Links":1467200},"/statetree/latest/cron":{"Size":12,"Links":1},"/statetree/latest/datacap":{"Size":191058,"Links":880},"/statetree/latest/eam":{"Size":0,"Links":0},"/statetree/latest/ethaccount":{"Size":1,"Links":1},"/statetree/latest/evm":{"Size":124879,"Links":1014},"/statetree/latest/evm/Bytecode":{"Size":6136270,"Links":757},"/statetree/latest/evm/ContractState":{"Size":21839720,"Links":111352},"/statetree/latest/init":{"Size":57,"Links":1},"/statetree/latest/init/AddressMap":{"Size":72803412,"Links":192096},"/statetree/latest/multisig":{"Size":758481,"Links":10656},"/statetree/latest/multisig/PendingTxns":{"Size":236204,"Links":1030},"/statetree/latest/paymentchannel":{"Size":264784,"Links":4697},"/statetree/latest/paymentchannel/LaneStates":{"Size":12308,"Links":179},"/statetree/latest/placeholder":{"Size":0,"Links":0},"/statetree/latest/reward":{"Size":162,"Links":1},"/statetree/latest/storagemarket":{"Size":336,"Links":1},"/statetree/latest/storagemarket/DealOpsByEpoch":{"Size":390104913,"Links":3107661},"/statetree/latest/storagemarket/EscrowTable":{"Size":91725,"Links":237},"/statetree/latest/storagemarket/LockedTable":{"Size":74697,"Links":307},"/statetree/latest/storagemarket/PendingDealAllocationIds":{"Size":1635281,"Links":11221},"/statetree/latest/storagemarket/PendingProposals":{"Size":38729060,"Links":43930},"/statetree/latest/storagemarket/Proposals":{"Size":4245651798,"Links":1007503},"/statetree/latest/storagemarket/States":{"Size":474965189,"Links":497054},"/statetree/latest/storageminer":{"Size":98838107,"Links":313978},"/statetree/latest/storageminer/AllocatedSectors":{"Size":8655608,"Links":6759},"/statetree/latest/storageminer/Deadlines":{"Size":705660654,"Links":3153636},"/statetree/latest/storageminer/Info":{"Size":5324209,"Links":67206},"/statetree/latest/storageminer/PreCommittedSectors":{"Size":4583519,"Links":3803},"/statetree/latest/storageminer/PreCommittedSectorsCleanUp":{"Size":17208536,"Links":249150},"/statetree/latest/storageminer/Sectors":{"Size":57337155988,"Links":20388903},"/statetree/latest/storageminer/VestingFunds":{"Size":21552723,"Links":4154},"/statetree/latest/storagepower":{"Size":243,"Links":1},"/statetree/latest/storagepower/Claims":{"Size":8318580,"Links":36659},"/statetree/latest/storagepower/CronEventQueue":{"Size":47946,"Links":158},"/statetree/latest/system":{"Size":44,"Links":1},"/statetree/latest/system/BuiltinActors":{"Size":7622355,"Links":17},"/statetree/latest/verifiedregistry":{"Size":181,"Links":1},"/statetree/latest/verifiedregistry/Allocations":{"Size":39007416,"Links":45480},"/statetree/latest/verifiedregistry/Claims":{"Size":1478667284,"Links":1347045},"/statetree/latest/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/latest/verifiedregistry/Verifiers":{"Size":1753,"Links":11}}
After nv19
{"/":{"Size":0,"Links":0},"/headers":{"Size":15128123760,"Links":110736984},"/messages":{"Size":331485128,"Links":343687},"/statetree":{"Size":0,"Links":0},"/statetree/churn":{"Size":2795170560,"Links":1053576},"/statetree/churn/account":{"Size":0,"Links":0},"/statetree/churn/cron":{"Size":0,"Links":0},"/statetree/churn/datacap":{"Size":53040057,"Links":53357},"/statetree/churn/ethaccount":{"Size":0,"Links":0},"/statetree/churn/evm":{"Size":269986,"Links":2195},"/statetree/churn/evm/Bytecode":{"Size":0,"Links":0},"/statetree/churn/evm/ContractState":{"Size":47414724,"Links":64716},"/statetree/churn/init":{"Size":39900,"Links":700},"/statetree/churn/init/AddressMap":{"Size":5340929,"Links":3894},"/statetree/churn/multisig":{"Size":81023,"Links":1095},"/statetree/churn/multisig/PendingTxns":{"Size":64520,"Links":377},"/statetree/churn/paymentchannel":{"Size":0,"Links":0},"/statetree/churn/paymentchannel/LaneStates":{"Size":0,"Links":0},"/statetree/churn/reward":{"Size":326360,"Links":1990},"/statetree/churn/storagemarket":{"Size":668640,"Links":1990},"/statetree/churn/storagemarket/DealOpsByEpoch":{"Size":461097869,"Links":454394},"/statetree/churn/storagemarket/EscrowTable":{"Size":36678778,"Links":29914},"/statetree/churn/storagemarket/LockedTable":{"Size":32018113,"Links":33925},"/statetree/churn/storagemarket/PendingDealAllocationIds":{"Size":379375572,"Links":463297},"/statetree/churn/storagemarket/PendingProposals":{"Size":843571493,"Links":579877},"/statetree/churn/storagemarket/Proposals":{"Size":98154560,"Links":65768},"/statetree/churn/storagemarket/States":{"Size":4224467329,"Links":2175384},"/statetree/churn/storageminer":{"Size":91896896,"Links":272557},"/statetree/churn/storageminer/AllocatedSectors":{"Size":16966390,"Links":26070},"/statetree/churn/storageminer/Deadlines":{"Size":597407300,"Links":831668},"/statetree/churn/storageminer/Info":{"Size":5847,"Links":60},"/statetree/churn/storageminer/PreCommittedSectors":{"Size":435120386,"Links":183201},"/statetree/churn/storageminer/PreCommittedSectorsCleanUp":{"Size":70378563,"Links":120397},"/statetree/churn/storageminer/Sectors":{"Size":222871525,"Links":199792},"/statetree/churn/storageminer/VestingFunds":{"Size":87627812,"Links":15461},"/statetree/churn/storagepower":{"Size":483570,"Links":1990},"/statetree/churn/storagepower/Claims":{"Size":16842494,"Links":16153},"/statetree/churn/storagepower/CronEventQueue":{"Size":5735561,"Links":2123},"/statetree/churn/system":{"Size":0,"Links":0},"/statetree/churn/system/BuiltinActors":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry":{"Size":359647,"Links":1987},"/statetree/churn/verifiedregistry/Allocations":{"Size":665896042,"Links":362628},"/statetree/churn/verifiedregistry/Claims":{"Size":537375652,"Links":292083},"/statetree/churn/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/churn/verifiedregistry/Verifiers":{"Size":2100,"Links":4},"/statetree/latest":{"Size":242065632,"Links":208164},"/statetree/latest/account":{"Size":36450951,"Links":1505042},"/statetree/latest/cron":{"Size":12,"Links":1},"/statetree/latest/datacap":{"Size":205300,"Links":1035},"/statetree/latest/eam":{"Size":0,"Links":0},"/statetree/latest/ethaccount":{"Size":1,"Links":1},"/statetree/latest/evm":{"Size":279361,"Links":2269},"/statetree/latest/evm/Bytecode":{"Size":11644911,"Links":1237},"/statetree/latest/evm/ContractState":{"Size":64404174,"Links":345051},"/statetree/latest/init":{"Size":57,"Links":1},"/statetree/latest/init/AddressMap":{"Size":76282146,"Links":208877},"/statetree/latest/multisig":{"Size":787587,"Links":11070},"/statetree/latest/multisig/PendingTxns":{"Size":339661,"Links":1076},"/statetree/latest/paymentchannel":{"Size":264958,"Links":4700},"/statetree/latest/paymentchannel/LaneStates":{"Size":12299,"Links":178},"/statetree/latest/placeholder":{"Size":0,"Links":0},"/statetree/latest/reward":{"Size":164,"Links":1},"/statetree/latest/storagemarket":{"Size":336,"Links":1},"/statetree/latest/storagemarket/DealOpsByEpoch":{"Size":453704629,"Links":3006011},"/statetree/latest/storagemarket/EscrowTable":{"Size":97550,"Links":264},"/statetree/latest/storagemarket/LockedTable":{"Size":81300,"Links":348},"/statetree/latest/storagemarket/PendingDealAllocationIds":{"Size":4216469,"Links":30995},"/statetree/latest/storagemarket/PendingProposals":{"Size":171913122,"Links":475075},"/statetree/latest/storagemarket/Proposals":{"Size":5536691212,"Links":1302165},"/statetree/latest/storagemarket/States":{"Size":622708452,"Links":642194},"/statetree/latest/storageminer":{"Size":99320640,"Links":315518},"/statetree/latest/storageminer/AllocatedSectors":{"Size":8776044,"Links":6944},"/statetree/latest/storageminer/Deadlines":{"Size":735375198,"Links":2987852},"/statetree/latest/storageminer/Info":{"Size":5359425,"Links":67580},"/statetree/latest/storageminer/PreCommittedSectors":{"Size":6523712,"Links":5065},"/statetree/latest/storageminer/PreCommittedSectorsCleanUp":{"Size":17024374,"Links":238709},"/statetree/latest/storageminer/Sectors":{"Size":58355319120,"Links":20690405},"/statetree/latest/storageminer/VestingFunds":{"Size":21119652,"Links":4120},"/statetree/latest/storagepower":{"Size":243,"Links":1},"/statetree/latest/storagepower/Claims":{"Size":8338825,"Links":36688},"/statetree/latest/storagepower/CronEventQueue":{"Size":47247,"Links":154},"/statetree/latest/system":{"Size":44,"Links":1},"/statetree/latest/system/BuiltinActors":{"Size":7538245,"Links":17},"/statetree/latest/verifiedregistry":{"Size":181,"Links":1},"/statetree/latest/verifiedregistry/Allocations":{"Size":64351280,"Links":75301},"/statetree/latest/verifiedregistry/Claims":{"Size":2338159840,"Links":2304713},"/statetree/latest/verifiedregistry/RemoveDataCapProposalIDs":{"Size":0,"Links":0},"/statetree/latest/verifiedregistry/Verifiers":{"Size":1781,"Links":11}}
Trail of outputs:
edited by @ZenGround0
User Story
I am a user who develops the filecoin protocol or software involved in running or using the protocol.
1) I want to understand what is happening with filecoin state. I want to see the current breakdown of byte usage by the protocol to understand if there are dangerous trends that need to be addressed and get ideas for how to fix them.
2) I also want to be able to go back and analyze how previous changes have impacted snapshot size / protocol data usage.
3) I can visualize how new changes I have developed impact snapshot size / protocol data usage.
Acceptance Criteria
I can run a command or a small set of commands that gives me two things provided with a filecoin snapshot:
1) data on the breakdown of state usage of a chain snapshot broken down into a) state tree b) message / headers c) state churn. Further breakdown of state and churn by i) actor type ii) actor type field and further breakdown of messages by message type
2) a visualization of this data for quick inspection. This visualization will be a DiskInventoryX style treemap breakdown of the above data.
Techincal Breakdowns
Simplifying assumptions (we can revisit these)
Bonus Cake visualization suggestion from andy: https://askubuntu.com/questions/73160/how-do-i-find-the-amount-of-free-space-on-my-hard-drive. This is a good possible alternative to the python based treemap