Open ze42 opened 11 months ago
We are already suffering a slow down with the current data sent and time it takes to run the deal filters. The issue is captured at https://github.com/filecoin-project/boost/issues/1733 I would suggest an overhaul of how external filters are triggered and what data should be sent for each deal.
Proposal of the new format:
Before going to a complete change of workflow, and the way dealfilters interface with boost, we should have a look at what exactly takes time.
As @cryptowhizzard mentionned, removing the parameter generation seems to be enough. My guess would be that some parameters collected takes too much resources when applied to big storage.
Having only values that would not be dependant on machine sizes, but mostly linear should probably be enough to fix the issue.
From some quick search on our end, we suspect the api.SectorsSummary(ctx)
to be taking a long of time, and being the main reason it takes way too long.
This api call in Lotus takes along time to proceed. Calling lotus /debug/metric
returns the data (along alot of other stuffs) much faster. Maybe getting the data from there (or ask Lotus to get a faster API getting the computed data, rather than computing it each time) would be enough to drastictly reduce the time, and still have the desired data.
Example, calling SectorsSummary (almost one second on that miner, could take over 30s on big miners like for cryptowhizzard)
time curl -s -X PST -H "Content-Type: application/json" --data '{ "jsonrpc": "2.0", "method": "Filecoin.'SectorsSummary'", "params": [''], "id": 3 }' 'http://miner:2345/rpc/v0'%7C jq .
{
"jsonrpc": "2.0",
"result": {
"Packing": 7,
"PreCommit1": 16,
"PreCommit2": 2,
"Proving": 6328,
"Removed": 6852,
"WaitSeed": 1
},
"id": 3
}
time: 926ms
And very fast metrics API, returning the same data (cut to only the matching ones here)
curl -s http://miner:2345/debug/metrics
[...]
lotus_sealing_states{sector_state="Packing"} 7
lotus_sealing_states{sector_state="PreCommit1"} 16
lotus_sealing_states{sector_state="PreCommit2"} 2
[...]
lotus_sealing_states{sector_state="Proving"} 6328
lotus_sealing_states{sector_state="Removed"} 6851
[...]
lotus_sealing_states{sector_state="WaitSeed"} 1
[...]
time: 79ms
Even with the most leaniest dealfilter ( I am only sending the deal parameters , no sector status etc. etc. ) it seems that boost still cannot handle incoming deal requests properly when it is stressed / loaded.
Fyi to give context to the thing below:
When the duration is above 30 seconds the client ( sender ) of the deal is gone. The client get's "stream reset" errors.
2023-10-19T10:08:38.133+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "9a55c269-b8e4-4542-bfdd-214d02943716", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "14d75316-4737-40bc-9ce2-b858e60c55a4", "duration": "1m45.666006766s"} 2023-10-19T10:09:02.341+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "00996245-b22f-466d-99c6-a7f8d4cada1e", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqn436o6hcwhxrywdjbbfvi76wnmjxaifgoshxloxqtrvc33qzpsea", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333374", "end epoch": "4802174", "price per epoch": "0", "duration": "1m39.871729141s"} 2023-10-19T10:09:06.923+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "d45bf699-c13f-4073-bf5a-b88fee384157", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "00996245-b22f-466d-99c6-a7f8d4cada1e", "duration": "1m44.454020025s"} 2023-10-19T10:09:30.772+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "dc84f8ff-192f-4d9a-a3d7-89af600afd2d", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqmxhxnoyi5jjqyo5slskj43ljzukapx4uhhekdpups76sk4upbkjy", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333375", "end epoch": "4802175", "price per epoch": "0", "duration": "1m38.33452143s"} 2023-10-19T10:09:32.408+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "6e069c6b-0401-4c9b-86d4-ee1e53f544e2", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "dc84f8ff-192f-4d9a-a3d7-89af600afd2d", "duration": "1m39.970162675s"} 2023-10-19T10:09:44.597+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "b9cd1bef-66ed-4c81-b7a3-c82d75ef3c44", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqev445lvuzgx3sdsosa5uqiq2ylwqkmuutfmptsshdjyayqyrfoei", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333376", "end epoch": "4802176", "price per epoch": "0", "duration": "1m22.132652245s"} 2023-10-19T10:09:44.608+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "86829218-0e35-4bc5-8383-69f3ac9f3def", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "b9cd1bef-66ed-4c81-b7a3-c82d75ef3c44", "duration": "1m22.144552813s"} 2023-10-19T10:09:49.990+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "e12e13d7-a9dd-4816-8020-cadd69f863a5", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqcp424ahi24dmvuvotk2jldvya44vuvksa2un3cujxplvdtwmdebi", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333378", "end epoch": "4802178", "price per epoch": "0", "duration": "27.566287954s"} 2023-10-19T10:09:49.991+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "f7a790e0-0a74-47d5-bae3-c7c1ba4b3c6f", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqoqumfeolmerbapmgnwhy7lefngm4jqqu5tqauv6eev6infn5iubq", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333377", "end epoch": "4802177", "price per epoch": "0", "duration": "57.551637991s"} 2023-10-19T10:09:58.445+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "6efd9218-5785-47c7-a636-e596b93e63fc", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "f7a790e0-0a74-47d5-bae3-c7c1ba4b3c6f", "duration": "1m6.006104394s"} 2023-10-19T10:09:58.447+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "b41d58d9-55d2-4a37-850d-4383684e512c", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "e12e13d7-a9dd-4816-8020-cadd69f863a5", "duration": "36.02376174s"}
@cryptowhizzard Could you try with the additional commit from https://github.com/CIDgravity/boost/tree/dealfilter-timing which should add some timing logs for deal-filter, having getDealFilterParams on one side, and exec on the other side?
Hi can you move on this one ?
I've figured out the slowness on my machine;
its 3 serialized calls to a seemingly really slow database, providing funds, sealing and space information;
https://github.com/filecoin-project/boost/blob/main/storagemarket/provider_dealfilter.go#L42-L70
These calls can take 3 to 8 seconds, but sometimes (over multiple deals in 1 hour) they are 30 to 70 seconds (which raise another bug which I'll post here too)
I've removed them now and will post new times that I'm seeing
Also see https://filecoinproject.slack.com/archives/C03CKDLEWG1/p1702930422361979 for my bug hunt
This also took 3 seconds, as did the step inserted deal into deals DB
Here are the DB sizes:
-rw-r--r-- 1 user user 329M Dec 20 20:01 boost.retrieval.db
-rw-r--r-- 1 user user 53M Dec 20 20:01 boost.logs.db
-rw-r--r-- 1 user user 680M Dec 20 19:57 boost.db
see https://github.com/filecoin-project/boost/issues/1852 for the missing index issues see https://github.com/filecoin-project/boost/issues/1851 for a bunch of useless data
@RobQuistNL
Please note that with removing the things from provider dealfilter.go you will be incompatible with CID gravity.
It’s on my list to reover some parts build from external sources like Farcaster, after 7 jan 2024. It should not be in Boost anyway in my personal opinion. Boost should make deals and faaaaasssstttttt :)
Checklist
Ideas
.Boost component
What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.
Provide to storage providers the ability to accept deals based on the current concurrent downloads through the deal filter.
The current info allows to filter based on the sealing pipeline states (AP / PC1 / PC2/ ... ) but when receiving alot of online deals, many concurrent downloads can be started before reaching the miner. The idea is to give miners the ability to take the boost staged deals that will soon reach the sealing pipeline into account.
State of the different deal currently being processed should be taken into account when deciding if we want to accept new deals
Describe the solution you'd like
Boost add these existing info to the dealfilter.
Proposed format :
Describe alternatives you've considered
None
Additional context
We made that development and can propose a PR.