druid-io / pydruid

A Python connector for Druid
Other
505 stars 194 forks source link

Evaluating query runtime without output #281

Open AKheli opened 2 years ago

AKheli commented 2 years ago

Hello,

I am using PyDruid to evaluate a query runtime in Druid without taking in account the results output that are obtained on the API.

from pydruid.db import connect
import time

conn = connect(host='localhost', port=8082, path='/druid/v2/sql/', scheme='http')
curs = conn.cursor()
start = time.time()
curs.execute("""
    SELECT id_station, count(*) FROM bafu_comma where id_station IN (32, 54, 8, 25, 95, 13, 80, 16, 83, 27) group by id_station
""")
end1 = time.time()
print('exeution runtime:', (end1 - start) * 1000, 'ms')
print('number of rows:', sum(1 for _ in curs))
end2 = time.time()
# for row in curs:
#      print(row)
print('total time: ',(end2 - start) * 1000, 'ms')

Is this a correct way of measuring the runtime. My execution time is always around 200ms or 50ms which is a bit suspecious. Also, the total runtime that I obtain is much higher than the results that I obtain in the API.

Any ideas on how to properly evaluate a query execution time in Druid?

Thanks!

betodealmeida commented 1 year ago

I'm not sure if that's correct. The DB API connector will stream the results from Druid, so unless you have iterated over all the result set I don't think you can assume that the query execution has finished.

https://github.com/druid-io/pydruid/blob/bd7b741a93c11733f928d649b9927448032e11f4/pydruid/db/api.py#L365-L380

The correct time is probably closer to end2 - start in this case, I think.