filecoin-project / dealbot

🤖🤝 A bot for making deals
Apache License 2.0
32 stars 10 forks source link

Investigate long TTFB for retrievals in the deal dashboard #421

Open davidd8 opened 2 years ago

davidd8 commented 2 years ago

The observable dashboard is showing high average TTFB metrics for retrievals, on the order of hours. Is this an issue with the dashboard, or are retrievals taking this long to get started in data transfer (could it be related to a concurrency limit set by SPs, noted in https://www.notion.so/pl-strflt/Estuary-Elijah-1-5-22-505f2f1ac57648f1bd983323ffb47d48)?

dkkapur commented 2 years ago

the graphql endpoint has the ttfb data per deal / dealbot task coming in:

  query: `query {FinishedTasks(UUIDs: ${JSON.stringify(uuids)}) { All { MinerLatencyMS TimeToFirstByteMS TimeToLastByteMS ClientVersion MinerVersion ProposalCID DealIDString}}}`})
dkkapur commented 2 years ago

from @willscott, TTFB is calculated as

i think it's when the state change to transfering / data received first happens after when the request starts

dkkapur commented 2 years ago

as a step 1 here, would be great to just get the distribution of TTFBs for all the retrieval attempts in the last week. would help identify if everything has gotten worse or we just have a few outliers (and if so, which SP IDs those are coming from).

davidd8 commented 2 years ago

cc @kylehuntsman

kylehuntsman commented 2 years ago

I agree, I think the metric is correct in showing the average time, but the underlying data could be misrepresenting the practical norm. We could calculate the median as a real quick sanity check.

davidd8 commented 2 years ago

I did a quick check, and it looks like the median is about 2.5hrs with the lowest TTFB at 68m and the highest around 10hrs. So the numbers are consistently higher than expected.

brendalee commented 2 years ago

out of curiosity, is it possible to get the minerIDs for these? maybe we can try to understand if it was for unsealed data or there was some other issue