Closed danthegoodman1 closed 1 year ago
Hey @danthegoodman1 statistics are included when using the JSON* formats, ie: JSONCompact
(possibly others) but they really only contain the elapsed time since there is no storage attached and network operations are not accounted for (yet):
:) SELECT 1;
{
"meta":
[
{
"name": "1",
"type": "UInt8"
}
],
"data":
[
[1]
],
"rows": 1,
"statistics":
{
"elapsed": 0.002978142,
"rows_read": 0,
"bytes_read": 0
}
}
Thanks for the pointer, I think it'd be very valuable to have for df and arrow formats as well as those provide a lot of optimizations.
Or even for CSV for example where I can probabyl stream results back to an HTTP client for example
Sadly not all formats will allow this without poisoning the dataset results and I doubt CSV could. We need to see where else statistics end up from various format sources. Note statistics are returned at a driver level for most native use cases, and only included in flexible formats such as JSON due to the above.
Could they be included in a sort of like x.fetchstats()
similar to how DuckDB has a ddb.fetchall()
to retrieve results?
Possibly. That's good question for @auxten I suppose the uniform way would be to return our own chdb statistics and measure response sizes, etc in the middle.
@lmangani I would definitely need bytes to be able to use for bighouse, as that's the only fair usage metric as you never know when network hiccups can cause extended query times.
Like clickhouse server/local, it would be great if we can get the time spent processing the query, number of rows processed, and the total bytes read for a given query.