filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.85k stars 1.27k forks source link

A new ChainIndexer that subsumes that existing MsgIndex, EventIndex and TransactionIndex #12453

Open aarshkshah1992 opened 2 months ago

aarshkshah1992 commented 2 months ago

Summary

This issue is for the implementation of a new ChainIndexer in Lotus that will replace and subsume the existing MsgIndex, EventsIndex, and EthTxHashIndex, which are currently fragmented across multiple databases and have several known issues documented in filecoin-project/lotus#12293.

Key Features

The ChainIndexer offers the following key features:

Note: while the ChainIndexer is primarily focused on events and ETH RPC usecases, it also benefits pre-FEVM as well. For example, StateSearchMsg and its various dependents will now have a shortcut to find the message.

Implementation Items

### Tasks
- [x] Land `ChainIndexer` implementation at https://github.com/filecoin-project/lotus/pull/12421
- [x] Land https://github.com/filecoin-project/lotus/pull/12463
- [x] Land Migration/Backfilling/Diagnostics tooling for the `ChainIndexer` at https://github.com/filecoin-project/lotus/pull/12450
- [x] Land https://github.com/filecoin-project/lotus/pull/12485
- [x] https://github.com/filecoin-project/lotus/issues/12489 (Land https://github.com/filecoin-project/lotus/pull/12504)
- [ ] Work with Glif to deploy the `ChainIndexer` on their FEVM archival node and re-index the entire archived chain history in the `ChainIndexer`. OKU should then be able to use that node for their app. A green light from Glif & OKU is what we need here to then ship this out to all users.
- [x] Write a document explaining the mechanics of re-indexing/backfilling/diagnostics/inspection of the `ChainIndexer` and explain how RPC providers can use the RPCs/CLI commands being built out in https://github.com/filecoin-project/lotus/pull/12450 for all of those use cases. (See https://github.com/filecoin-project/lotus/pull/12600).
- [x] Document the Config changes introduced in https://github.com/filecoin-project/lotus/pull/12421 so users understand the knobs they can turn on/off for the `ChainIndexer`. (See [here](https://github.com/filecoin-project/lotus/pull/12450/files#diff-1602b1b65c4f148506a1587cc3927923c05dec6381089aa69cb928fcee1df29b))
- [x] Document clean up process for the existing `MsgIndex`,`EthTxIndex` and `EventIndex` (should be as simple as getting rid of the Sqlite DBs in the Lotus repo). (See [here](https://github.com/filecoin-project/lotus/pull/12450/files#diff-1602b1b65c4f148506a1587cc3927923c05dec6381089aa69cb928fcee1df29b)).
- [ ] Determine release timing/process in light of nv24
- [ ] Move operator docs to lotus-docs
- [x] Update operator docs with benchmarks around timing and sizes

Switch RPC APIs to use the Chain Index

Read APIs Should Account for the Async Nature of Indexing

ETH RPC APIs Should Only Expose Executed Tipsets and Messages

Removing Re-orged Tipsets That Are No Longer Part of the Canonical Chain

Garbage Collection

Snapshot Hydration

Automated Backfilling

Simplify Indexing Config

Migration from Old Indices to the New ChainIndex

BigLep commented 1 month ago

🧵 From Slack conversations: "It will take 9-10 days to backfill the ChainIdexer all the way back to FEVM, but it is a one time cost, and you can copy the index over to other nodes, so you only need to run the backfill operation on one node."

A few questions/thoughts on this:

  1. What users do we need to proactively talk about this with? I know Glif is aware. Who else should we bring into this conversation?
  2. Do these users have multiple nodes so they can do a rolling upgrade?
  3. Related to number 2, is this a showstopper for any of these archival users?
  4. If this is a showstopper, what are our options?
    • Bootstrap chainindex.db from the 3 existing sqlite dbs (and then quickly identify areas that are missing data and backfill them)?
    • Have someone in the community generate chainindex.db and share with others (and include the accompanying verify commands)?
    • ???
BigLep commented 1 month ago

Some notes from 2024-10-09 Lotus standup focused on the "~9 days to backfill a FEVM-archival node" topic:

aarshkshah1992 commented 1 month ago

@BigLep

🧵 From Slack conversations: "It will take 9-10 days to backfill the ChainIdexer all the way back to FEVM, but it is a one time cost, and you can copy the index over to other nodes, so you only need to run the backfill operation on one node."

A few questions/thoughts on this:

  1. What users do we need to proactively talk about this with? I know Glif is aware. Who else should we bring into this conversation?
  2. Do these users have multiple nodes so they can do a rolling upgrade?
  3. Related to number 2, is this a showstopper for any of these archival users?
  4. If this is a showstopper, what are our options?
  • Bootstrap chainindex.db from the 3 existing sqlite dbs (and then quickly identify areas that are missing data and backfill them)?
  • Have someone in the community generate chainindex.db and share with others (and include the accompanying verify commands)?
  • ???

1) The long backfilling time is primarily a concern for archival nodes, not snapshot synced nodes. Protofire(Glif), Vulcanise and Blockscout are the three archival node operators I am aware of. Would love @eshon and @jennijuju to chime in if there are more. We've already proactively initiated conversations with Protofire/Glif about what's coming up. Ideally, we would deploy the ChainIndexer on their archival node first to serve a portion of their traffic and once we get a green-light from them -> onboard other RPC providers.

2) Do these users have multiple nodes so they can do a rolling upgrade? I know Protofire and Vulcanise do. I am unsure about the others.

3) If a user is only running one archival node , here are the options:

Also replied at: https://github.com/filecoin-project/lotus/pull/12450#discussion_r1794946530.

eshon commented 1 month ago

Another archival node provider is Zondax, let me share details later today with Jenni.

eshon commented 1 month ago

When you say "backfilling" do you specifically mean backfilling the FEVM indexes only would take 9 days?

Does this assume the node has already loaded all FEVM archival data since FEVM launch and is fully synced?

aarshkshah1992 commented 1 month ago

@eshon Yes this assumes that the node has already loaded all FEVM archival data since FEVM launch and is fully synced. "Backfilling" here refers to reading the chain state and indexing data that we need for faster RPC responses in the Index Database.

aarshkshah1992 commented 1 month ago

Results from testing on a dedicated Protofire FEVM Archival node. This node is doing nothing other than syncing the chain.

1) Backfilling 1 month of epochs backwards from the current chain head. Takes ~12 hours.

2024-10-08 18:06:43.525 starting chainindex validation; from epoch: 4336809; to epoch: 4250409; backfill: true; log-good: false
2024-10-08 18:15:49.508 -------- Chain index validation progress: 3.33%; Time elapsed: 9m5.98274048s
2024-10-08 18:27:27.114 -------- Chain index validation progress: 6.67%; Time elapsed: 20m43.58922645s
2024-10-08 18:42:42.489 -------- Chain index validation progress: 10.00%; Time elapsed: 35m58.963728548s
2024-10-08 19:01:34.272 -------- Chain index validation progress: 13.33%; Time elapsed: 54m50.747261985s
2024-10-08 19:27:53.144 -------- Chain index validation progress: 16.67%; Time elapsed: 1h21m9.618754411s
2024-10-08 20:06:49.629 -------- Chain index validation progress: 20.00%; Time elapsed: 2h0m6.103717312s
2024-10-08 21:10:58.370 -------- Chain index validation progress: 23.33%; Time elapsed: 3h4m14.844417783s
2024-10-08 22:17:20.862 -------- Chain index validation progress: 26.67%; Time elapsed: 4h10m37.337324591s
2024-10-08 23:26:31.600 -------- Chain index validation progress: 30.00%; Time elapsed: 5h19m48.07516203s
2024-10-09 00:31:51.979 -------- Chain index validation progress: 33.33%; Time elapsed: 6h25m8.453541436s
2024-10-09 01:58:04.654 -------- Chain index validation progress: 36.67%; Time elapsed: 7h51m21.128442883s
2024-10-09 03:06:59.404 -------- Chain index validation progress: 40.00%; Time elapsed: 9h0m15.878883989s
2024-10-09 03:19:06.227 -------- Chain index validation progress: 43.33%; Time elapsed: 9h12m22.702241843s
2024-10-09 03:29:00.946 -------- Chain index validation progress: 46.67%; Time elapsed: 9h22m17.420597166s
2024-10-09 03:38:47.714 -------- Chain index validation progress: 50.00%; Time elapsed: 9h32m4.189265746s
2024-10-09 03:48:33.692 -------- Chain index validation progress: 53.33%; Time elapsed: 9h41m50.167261601s
2024-10-09 03:58:44.708 -------- Chain index validation progress: 56.67%; Time elapsed: 9h52m1.183098448s
2024-10-09 04:09:45.871 -------- Chain index validation progress: 60.00%; Time elapsed: 10h3m2.346345951s
2024-10-09 04:21:08.180 -------- Chain index validation progress: 63.33%; Time elapsed: 10h14m24.654708182s
2024-10-09 04:32:44.268 -------- Chain index validation progress: 66.67%; Time elapsed: 10h26m0.742834532s
2024-10-09 04:43:09.888 -------- Chain index validation progress: 70.00%; Time elapsed: 10h36m26.36274386s
2024-10-09 04:51:30.369 -------- Chain index validation progress: 73.33%; Time elapsed: 10h44m46.843732873s
2024-10-09 05:02:44.664 -------- Chain index validation progress: 76.67%; Time elapsed: 10h56m1.138670683s
2024-10-09 05:14:33.169 -------- Chain index validation progress: 80.00%; Time elapsed: 11h7m49.644118179s
2024-10-09 05:26:52.491 -------- Chain index validation progress: 83.33%; Time elapsed: 11h20m8.965545335s
2024-10-09 05:39:28.663 -------- Chain index validation progress: 86.67%; Time elapsed: 11h32m45.138303295s
2024-10-09 05:51:50.451 -------- Chain index validation progress: 90.00%; Time elapsed: 11h45m6.925924816s
2024-10-09 06:03:02.344 -------- Chain index validation progress: 93.33%; Time elapsed: 11h56m18.819100394s
2024-10-09 06:15:01.300 -------- Chain index validation progress: 96.67%; Time elapsed: 12h8m17.774766296s
2024-10-09 06:26:34.288 -------- Chain index validation progress: 100.00%; Time elapsed: 12h19m50.762641635s
2024-10-09 06:26:34.305 -------- Chain index validation progress: 100.00%; Time elapsed: 12h19m50.779804039s

2) Backfilling 1 month of epochs post FEVM launch . Takes ~10 hours.

2024-10-09 06:34:36.198 starting chainindex validation; from epoch: 2769848; to epoch: 2683448; backfill: true; log-good: false
2024-10-09 06:54:32.847 -------- Chain index validation progress: 3.33%; Time elapsed: 19m56.648777171s
2024-10-09 07:13:29.590 -------- Chain index validation progress: 6.67%; Time elapsed: 38m53.391578991s
2024-10-09 07:31:37.937 -------- Chain index validation progress: 10.00%; Time elapsed: 57m1.738433863s
2024-10-09 07:53:53.763 -------- Chain index validation progress: 13.33%; Time elapsed: 1h19m17.564622641s
2024-10-09 08:17:20.598 -------- Chain index validation progress: 16.67%; Time elapsed: 1h42m44.400170981s
2024-10-09 08:38:23.602 -------- Chain index validation progress: 20.00%; Time elapsed: 2h3m47.403992297s
2024-10-09 08:59:40.515 -------- Chain index validation progress: 23.33%; Time elapsed: 2h25m4.31638391s
2024-10-09 09:22:41.837 -------- Chain index validation progress: 26.67%; Time elapsed: 2h48m5.638957169s
2024-10-09 09:46:41.586 -------- Chain index validation progress: 30.00%; Time elapsed: 3h12m5.387221278s
2024-10-09 10:09:15.496 -------- Chain index validation progress: 33.33%; Time elapsed: 3h34m39.29731905s
2024-10-09 10:30:27.827 -------- Chain index validation progress: 36.67%; Time elapsed: 3h55m51.628606445s
2024-10-09 10:51:02.016 -------- Chain index validation progress: 40.00%; Time elapsed: 4h16m25.817962431s
2024-10-09 11:13:19.400 -------- Chain index validation progress: 43.33%; Time elapsed: 4h38m43.201847276s
2024-10-09 11:35:17.255 -------- Chain index validation progress: 46.67%; Time elapsed: 5h0m41.0564808s
2024-10-09 11:58:17.438 -------- Chain index validation progress: 50.00%; Time elapsed: 5h23m41.240064043s
2024-10-09 12:19:09.401 -------- Chain index validation progress: 53.33%; Time elapsed: 5h44m33.202230962s
2024-10-09 12:39:43.318 -------- Chain index validation progress: 56.67%; Time elapsed: 6h5m7.120162996s
2024-10-09 13:00:36.205 -------- Chain index validation progress: 60.00%; Time elapsed: 6h26m0.007156519s
2024-10-09 13:22:07.533 -------- Chain index validation progress: 63.33%; Time elapsed: 6h47m31.334230385s
2024-10-09 13:42:22.805 -------- Chain index validation progress: 66.67%; Time elapsed: 7h7m46.606813157s
2024-10-09 14:02:50.702 -------- Chain index validation progress: 70.00%; Time elapsed: 7h28m14.503955704s
2024-10-09 14:23:17.452 -------- Chain index validation progress: 73.33%; Time elapsed: 7h48m41.253678763s
2024-10-09 14:42:55.491 -------- Chain index validation progress: 76.67%; Time elapsed: 8h8m19.292820409s
2024-10-09 15:05:11.490 -------- Chain index validation progress: 80.00%; Time elapsed: 8h30m35.292191527s
2024-10-09 15:27:14.396 -------- Chain index validation progress: 83.33%; Time elapsed: 8h52m38.197724796s
2024-10-09 15:49:58.772 -------- Chain index validation progress: 86.67%; Time elapsed: 9h15m22.573845885s
2024-10-09 16:12:19.897 -------- Chain index validation progress: 90.00%; Time elapsed: 9h37m43.698457415s
2024-10-09 16:33:45.127 -------- Chain index validation progress: 93.33%; Time elapsed: 9h59m8.929105029s
2024-10-09 16:56:38.008 -------- Chain index validation progress: 96.67%; Time elapsed: 10h22m1.809325232s
2024-10-09 17:19:30.228 -------- Chain index validation progress: 100.00%; Time elapsed: 10h44m54.030102146s
2024-10-09 17:19:30.354 -------- Chain index validation progress: 100.00%; Time elapsed: 10h44m54.155308084s

3) Backfilling 1 month of epochs mid-way between FEVM launch and the current chain head. Takes ~13 hours

2024-10-09 18:06:50.812 starting chainindex validation; from epoch: 3511567; to epoch: 3425167; backfill: true; log-good: false
2024-10-09 18:22:00.482 -------- Chain index validation progress: 3.33%; Time elapsed: 15m9.670482824s
2024-10-09 18:35:25.365 -------- Chain index validation progress: 6.67%; Time elapsed: 28m34.553606048s
2024-10-09 18:48:19.165 -------- Chain index validation progress: 10.00%; Time elapsed: 41m28.353796507s
2024-10-09 19:01:29.618 -------- Chain index validation progress: 13.33%; Time elapsed: 54m38.806024773s
2024-10-09 19:15:12.071 -------- Chain index validation progress: 16.67%; Time elapsed: 1h8m21.259877238s
2024-10-09 19:30:44.968 -------- Chain index validation progress: 20.00%; Time elapsed: 1h23m54.15652168s
2024-10-09 19:50:59.944 -------- Chain index validation progress: 23.33%; Time elapsed: 1h44m9.132300745s
2024-10-09 20:19:22.942 -------- Chain index validation progress: 26.67%; Time elapsed: 2h12m32.130043369s
2024-10-09 20:52:27.399 -------- Chain index validation progress: 30.00%; Time elapsed: 2h45m36.587897912s
2024-10-09 21:20:40.064 -------- Chain index validation progress: 33.33%; Time elapsed: 3h13m49.25204028s
2024-10-09 21:50:49.975 -------- Chain index validation progress: 36.67%; Time elapsed: 3h43m59.162984189s
2024-10-09 22:18:22.220 -------- Chain index validation progress: 40.00%; Time elapsed: 4h11m31.408482377s
2024-10-09 22:45:28.032 -------- Chain index validation progress: 43.33%; Time elapsed: 4h38m37.22010544s
2024-10-09 23:12:16.162 -------- Chain index validation progress: 46.67%; Time elapsed: 5h5m25.350077042s
2024-10-09 23:39:37.234 -------- Chain index validation progress: 50.00%; Time elapsed: 5h32m46.422173688s
2024-10-10 00:10:51.416 -------- Chain index validation progress: 53.33%; Time elapsed: 6h4m0.604601922s
2024-10-10 00:46:44.348 -------- Chain index validation progress: 56.67%; Time elapsed: 6h39m53.536003528s
2024-10-10 01:31:14.595 -------- Chain index validation progress: 60.00%; Time elapsed: 7h24m23.783330796s
2024-10-10 04:05:18.792 -------- Chain index validation progress: 63.33%; Time elapsed: 9h58m27.980538058s
2024-10-10 04:25:13.568 -------- Chain index validation progress: 66.67%; Time elapsed: 10h18m22.756382023s
2024-10-10 04:45:33.326 -------- Chain index validation progress: 70.00%; Time elapsed: 10h38m42.514054977s
2024-10-10 05:05:31.425 -------- Chain index validation progress: 73.33%; Time elapsed: 10h58m40.613381271s
2024-10-10 05:25:52.663 -------- Chain index validation progress: 76.67%; Time elapsed: 11h19m1.850925885s
2024-10-10 05:45:20.150 -------- Chain index validation progress: 80.00%; Time elapsed: 11h38m29.338635039s
2024-10-10 06:05:00.198 -------- Chain index validation progress: 83.33%; Time elapsed: 11h58m9.386637783s
2024-10-10 06:24:40.726 -------- Chain index validation progress: 86.67%; Time elapsed: 12h17m49.91455223s
2024-10-10 06:43:07.535 -------- Chain index validation progress: 90.00%; Time elapsed: 12h36m16.723633668s
2024-10-10 07:00:46.843 -------- Chain index validation progress: 93.33%; Time elapsed: 12h53m56.031377256s
2024-10-10 07:20:33.779 -------- Chain index validation progress: 96.67%; Time elapsed: 13h13m42.967073078s
2024-10-10 07:38:29.397 -------- Chain index validation progress: 100.00%; Time elapsed: 13h31m38.585839486s
2024-10-10 07:38:29.963 -------- Chain index validation progress: 100.00%; Time elapsed: 13h31m39.151568536s

I am now running the index "doctor"/validation on these to sanity check that the backfilled data is in line with the chain state.

BigLep commented 1 month ago

@aarshkshah1992 : can we get final numbers on chainindex.db size for the full archival node? I know there were some numbers here, but I'm not sure how many tipsets that is and I'd also like to get a larger time range. I want to be able to make a statement like "As of 202410, ChainIndexer will accumulate approximately XMiB per day of data, or XGiB per month" in https://github.com/filecoin-project/lotus/pull/12600

BigLep commented 1 month ago

@aarshkshah1992 : can we get final numbers on chainindex.db size for the full archival node? I know there were some numbers here, but I'm not sure how many tipsets that is and I'd also like to get a larger time range. I want to be able to make a statement like "As of 202410, ChainIndexer will accumulate approximately XMiB per day of data, or XGiB per month" in #12600

I'm seeing our docs already had a statement that "The ChainIndex will consume ~10GB of storage per month of tipsets (e.g., ~86400 epochs)". I guess that's all I need but it would be good to have an official record of it in here like you have with backfill times in https://github.com/filecoin-project/lotus/issues/12453#issuecomment-2405306468

jennijuju commented 1 month ago

Would love @eshon and @jennijuju to chime in if there are more.

Talked with Eva and the summary (in notion) is shared with the team

aarshkshah1992 commented 1 month ago

@BigLep We have yet to index the entire history all the way upto FEVM launch. We were waiting on the reviews to land/get addressed so we can be sure that we're using the same indexing code as users.

Looks like the PR will be ready tomorrow (all reviews will have been addressed) -> will then kick-off an indexing of the entire state and also get all the numbers you need here.

aarshkshah1992 commented 1 month ago

@BigLep

The ChainIndex will consume ~10GB of storage per month of tipsets (e.g., ~86400 epochs)

That does not sound correct. Where did you get it from ? Please can we wait on the next round of archival node testing to get the final numbers ? I'll make sure to document them here once we have them.

BigLep commented 1 month ago

@aarshkshah1992

The ChainIndex will consume ~10GB of storage per month of tipsets (e.g., ~86400 epochs)

That does not sound correct. Where did you get it from ? Please can we wait on the next round of archival node testing to get the final numbers ? I'll make sure to document them here once we have them.

Ack, good to know. I can't recall / find where I got these numbers from. I was surprised to see them, so maybe I put them in as fillers. I don't remember. Anyways, I will put X placeholders for now and we'll update once official results have been published here.

aarshkshah1992 commented 1 month ago

@BigLep

Please see https://filecoinproject.slack.com/archives/CP50PPW2X/p1729413621133599.

~10G growth in the Index DB size per month is actually correct.

aarshkshah1992 commented 3 weeks ago

The ChainIndexer PR is now merged. Keeping this issue open till RPC providers upgrade and finish backfilling the Index.