Using hundreds of CAR files as a single block store is slow

lemmih commented 1 year ago

Issue summary

Multiple CAR files can be opened and joined into a single block store. This is convenient when using diff snapshots containing only the unique data generated over a range of epochs.

However, looking up a value in a ManyCar block store queries each CAR file. So if there are hundreds of CAR files, one lookup may query hundreds of files, one by one.

Steps to reproduce:

cd calibnet_diff
aws --endpoint https://2238a825c5aca59233eab1f221f7aefb.r2.cloudflarestorage.com/ s3 cp "s3://forest-archive/calibnet/lite/forest_snapshot_calibnet_2022-11-01_height_0.forest.car.zst" .
aws --endpoint https://2238a825c5aca59233eab1f221f7aefb.r2.cloudflarestorage.com/ s3 cp "s3://forest-archive/calibnet/diff/" . --recursive
forest-cli snapshot validate --check-links 780000 --check-stateroots 780000 forest_snapshot_calibnet_2022-11-01_height_0.forest.car.zst forest_diff_calibnet_height_*

One possible solution would be to order the CAR files according to when they were last used. Getting a successful hit from a CAR file should move it to the front of the list, ensuring active CAR files are queried first. This will improve performance if nearby epochs are accessed together rather than randomly.

Acceptance criteria:

[ ] Traversing 2000 epochs with 260 CAR files should be, at most, 10% slower than traversing with 10 CAR files.

Other information and links

ruseinov commented 1 year ago

Just throwing ideas here.

I remember us talking about this before, when we used a parallel iterator to deal with parallel queries.

The proposed solution sounds good, however it only optimises for one scenario. Which means that the performance is likely to suffer even more in case of random access. It could be interesting to have several scenarios covered with the ability to choose one or the other using an argument, with a sane default.

Perhaps we could try and re-introduce the parallel iterator for those cases when we have lots of CAR files, with an argument that optionally enables it and proper docs that explain performance implications.

We could also have some other interesting ways of optimising this, for example relying on a certain filename format, where we know which epochs each of the CAR files represent. That could perhaps help us put them in an order that optimises for certain types of traversal. Not sure if I'm making sense here, this is likely to be very similar to your LRU-style approach, except more work and constraints.

hanabi1224 commented 1 year ago

We could also have some other interesting ways of optimizing this, for example relying on a certain filename format, where we know which epochs each of the CAR files represent.

As I understand, the root CIDs of a lite archive can be used to construct its heaviest tipset, which contains the epoch information. As for diff archive, if we save a tuple of its range info (from_tipset_keys: TipsetKeys, end_tipset_keys: TipsetKeys) (and maybe the genesis CID is useful as well) to the CAR and use its CID as root, we have all useful information included in the CAR content.

aatifsyed commented 1 year ago

@lemmih says he's going to measure perf in a post-mmap world :)

ChainSafe / forest

Using hundreds of CAR files as a single block store is slow #3361