Closed xwang-otterai closed 3 months ago
@xwang-otterai thanks for the ticket. Are you using postgres or sqlite?
@mikeldking postgres
@mikeldking This query on average takes 110 seconds
SELECT traces.project_rowid, sum(CAST(spans.attributes #>> $1 AS FLOAT)) AS prompt, sum(CAST(spans.attributes #>> $2 AS FLOAT)) AS completion, coalesce(sum(CAST((spans.attributes #>> $1) AS FLOAT)), $3::INTEGER) + coalesce(sum(CAST((spans.attributes #>> $2) AS FLOAT)), $4::INTEGER) AS total
FROM traces JOIN spans ON traces.id = spans.trace_rowid
WHERE spans.start_time >= $5::TIMESTAMP WITH TIME ZONE AND spans.start_time < $6::TIMESTAMP WITH TIME ZONE AND traces.project_rowid IN ($7::INTEGER) G
@mikeldking This query on average takes 110 seconds
SELECT traces.project_rowid, sum(CAST(spans.attributes #>> $1 AS FLOAT)) AS prompt, sum(CAST(spans.attributes #>> $2 AS FLOAT)) AS completion, coalesce(sum(CAST((spans.attributes #>> $1) AS FLOAT)), $3::INTEGER) + coalesce(sum(CAST((spans.attributes #>> $2) AS FLOAT)), $4::INTEGER) AS total FROM traces JOIN spans ON traces.id = spans.trace_rowid WHERE spans.start_time >= $5::TIMESTAMP WITH TIME ZONE AND spans.start_time < $6::TIMESTAMP WITH TIME ZONE AND traces.project_rowid IN ($7::INTEGER) G
Thanks for this @xwang-otterai !
Thanks for reporting @xwang-otterai. It would help to know more about when the size of your data. If you have database access and are able to execute queries, the following information would be helpful.
select count(*) from projects;
select count(*) from traces;
select count(*) from spans;
SELECT
percentile_cont(0.01) WITHIN GROUP (ORDER BY attributes_length) AS p1,
percentile_cont(0.05) WITHIN GROUP (ORDER BY attributes_length) AS p5,
percentile_cont(0.10) WITHIN GROUP (ORDER BY attributes_length) AS p10,
percentile_cont(0.25) WITHIN GROUP (ORDER BY attributes_length) AS p25,
percentile_cont(0.50) WITHIN GROUP (ORDER BY attributes_length) AS p50,
percentile_cont(0.75) WITHIN GROUP (ORDER BY attributes_length) AS p75,
percentile_cont(0.90) WITHIN GROUP (ORDER BY attributes_length) AS p90,
percentile_cont(0.95) WITHIN GROUP (ORDER BY attributes_length) AS p95,
percentile_cont(0.99) WITHIN GROUP (ORDER BY attributes_length) AS p99
FROM (
SELECT LENGTH(attributes::text) AS attributes_length
FROM spans
ORDER BY random()
LIMIT 1000
) AS attributes_subquery;
Some improvements in arize-phoenix 4.20.0
, more to come soon.
Project Page Performance
✅ Ensure we are using dataloaders ✅ Ensure project page query lazily loads data for spans table (since only the traces table is displayed when you first hit the page) ✅ Reduce page size from 100 to 50 spans.
With the above, the initial project page query is down to a borderline acceptable time of ~6 seconds on my long context fixture:
Most of this time is due to querying attributes. To test, I changed the query to not pull attributes, and the time drops to ~0.3 seconds.
Low-hanging fruit is gone. Way to improve performance further:
If anyone is still experiencing issues with large numbers of traces or long-context traces on the latest version of Phoenix, please let us know and re-open this issue. There are further improvements we can make.
Describe the bug I've noticed that GraphQL queries are becoming slower as the amount of data increases
project page can't load in five minutes and got 504 timeout
To Reproduce ingest large amount of data
Expected behavior phoenix project page should load under 1minute Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Additional context Add any other context about the problem here (e.x. a link to a colab)