application-research / autoretrieve

A server to make GraphSync data accessible on IPFS
22 stars 7 forks source link

Don't put blocks on task queue #155

Closed hannahhoward closed 1 year ago

hannahhoward commented 1 year ago

Goal

We have chronic memory usage issues with Bedrock's autoretrieve instance. Memory allocated in go-graphsync upon receipt of new blocks makes it's way into autoretrieve's blocks.Manager.Put calls and then all the way into bitswap's peer task queue. With the default config of only one Bitswap message sending worker, this means under heavy bitswap requests, huge sets of block data are being put on the task queue as "Topic"s. They're held from the time a retrieval gets a block to the time a block gets to the top of the task queue to send over the wire. These topics, while a blank interface, were never meant to hold large objects like blocks, only intentifiers of data like cids, or other smaller items. This PR reverts them back to CIDs, like go-bitswap uses, and only reads them back off disk when a block is sending.

Implementation

hannahhoward commented 1 year ago

@elijaharita we know you originally did the refactor of the provider that made the blocks the topic for wantBlock calls. We're not sure of the full reasoning here, but we'd love to hear more.

In the meantime, we'd like to merge this (after review) and see if it resolves our memory issues -- if it turns out the problems are elsewhere, and this has significant throughput issues, we can revert back.

hannahhoward commented 1 year ago

We've been trying to solve our memory related issues for almost a week, because Go has terrible reference tracking (so heap dumps don't give us real answers across go routines) -- we've tried most other things and this seems like a much more likely hypothesis.

codecov-commenter commented 1 year ago

Codecov Report

Base: 20.84% // Head: 20.79% // Decreases project coverage by -0.05% :warning:

Coverage data is based on head (c61a585) compared to base (8216561). Patch coverage: 22.22% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #155 +/- ## ========================================== - Coverage 20.84% 20.79% -0.06% ========================================== Files 16 16 Lines 2341 2347 +6 ========================================== Hits 488 488 - Misses 1823 1829 +6 Partials 30 30 ``` | [Impacted Files](https://codecov.io/gh/application-research/autoretrieve/pull/155?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=application-research) | Coverage Δ | | |---|---|---| | [bitswap/provider.go](https://codecov.io/gh/application-research/autoretrieve/pull/155/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=application-research#diff-Yml0c3dhcC9wcm92aWRlci5nbw==) | `0.00% <0.00%> (ø)` | | | [blocks/randompruner.go](https://codecov.io/gh/application-research/autoretrieve/pull/155/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=application-research#diff-YmxvY2tzL3JhbmRvbXBydW5lci5nbw==) | `0.00% <0.00%> (ø)` | | | [blocks/manager.go](https://codecov.io/gh/application-research/autoretrieve/pull/155/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=application-research#diff-YmxvY2tzL21hbmFnZXIuZ28=) | `80.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=application-research). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=application-research)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

elijaharita commented 1 year ago

@elijaharita we know you originally did the refactor of the provider that made the blocks the topic for wantBlock calls. We're not sure of the full reasoning here, but we'd love to hear more.

In the meantime, we'd like to merge this (after review) and see if it resolves our memory issues -- if it turns out the problems are elsewhere, and this has significant throughput issues, we can revert back.

no worries, i didn't have a specific thought process for this and i am not surprised there might be issues with it. if u have a more correct solution then by all means, go for it!