cloudera / cm_api

Cloudera Manager API Client
Apache License 2.0
305 stars 284 forks source link

get_impala_queries does not return all records #68

Open funes79 opened 6 years ago

funes79 commented 6 years ago

When I run a get_impala_queries from python it returns just 2 records, even if I use the same date range and filter. When I filter it in the CM UI, the first two records appears immediately, and then after a second the rest of those queries. Is it possible to get the next result somehow?

Liuzhj commented 6 years ago

hi @funes79

do you have some code to show ?

funes79 commented 6 years ago

Running the query for expensive queries in the last 7 days returns just 2 record:

from datetime import datetime, timedelta
api = ApiResource(cm_host, username="reader", password="cmreader", version=18)
c = api.get_all_clusters()[0]

for s in c.get_all_services():
    if s.type == 'IMPALA':
        impala = s

now = datetime.utcnow()
daysback = 7
start = now - timedelta(days=daysback)
end = now
print('> Scanning last %s days, from %s till %s ' % ( daysback, start, end) )
filterStr = 'memory_aggregate_peak >= 60GB'
queries = impala.get_impala_queries(start_time=start, end_time=end, filter_str=filterStr, limit=1000, offset=0)
for query in queries.queries:
    print '> queyrid = '+query.queryId

Result:

Scanning last 7 days, from 2018-04-14 19:14:30.128989 till 2018-04-21 19:14:30.128989 queyrid = 4b4732a957d53eff:36aed14500000000 queyrid = 284a368120d0f55f:10950c0f00000000

But looping through and calling the get_impala_queries for shorter intervals returns more results:

for i in xrange(1,7):
    start = now - timedelta(days=i)
    end = now - timedelta(days=i-1)
    print('> Scanning %s days ago, from %s till %s ' % ( daysback, start, end) )    
    filterStr = 'memory_aggregate_peak >= 60GB'
    queries = impala.get_impala_queries(start_time=start, end_time=end, filter_str=filterStr, limit=1000, offset=0)
    for query in queries.queries:
        print '> queryid = '+query.queryId

Result:

Scanning 7 days ago, from 2018-04-20 19:14:30.128989 till 2018-04-21 19:14:30.128989 Scanning 7 days ago, from 2018-04-19 19:14:30.128989 till 2018-04-20 19:14:30.128989 queryid = 4b4732a957d53eff:36aed14500000000 queryid = 284a368120d0f55f:10950c0f00000000 Scanning 7 days ago, from 2018-04-18 19:14:30.128989 till 2018-04-19 19:14:30.128989 queryid = f4a9f149c0fc14d:520bc9ec00000000 queryid = 6a4b66e3cb8f1778:32ed8d7d00000000 queryid = bb488109a374db59:4eb2f8f700000000 queryid = b4963e9b6f9d1ea:9f55733800000000 .... and much more

Liuzhj commented 6 years ago

hi @funes79 ok ,i get it , i will try it tomorrow .

Liuzhj commented 6 years ago

hi @funes79

i am try to execute , it's execute normally, this it my code

import ...

def impala_query(cluster):
    end = datetime.now()
    start = end - timedelta(days=7)
    print start, end
    for s in cluster.get_all_services():
        if s.type == 'IMPALA':
            impala = s
            q =  impala.get_impala_queries(start_time=start, end_time=end, filter_str='database=xxx')
            for i in q.queries:
                print i.queryId

if __name__ == '__main__':
    try:
        cm_host = 'xx'
        api = ApiResource(cm_host, username='reader', password='cmreader', version=6)
        clusterName = api.get_all_clusters()[0]
        impala_query(clusterName)

image

funes79 commented 6 years ago

Cant be it somehow related to the fact, that the filtering takes some time, and in the GUI the first two rows appears immeadiately and then the others are fetched? So I think if the gui uses Solr or some other kind of index, then results are not available immeadiately, but in several steps..

This is typical for wiki search, where you get the first set of result and then a token, and with that token you continue to fetch more results

On Mon, Apr 23, 2018 at 4:33 AM, 刘志杰 notifications@github.com wrote:

hi @funes79 https://github.com/funes79

i am try to execute , it's execute normally, this it my code

import ... def impala_query(cluster): end = datetime.now() start = end - timedelta(days=7) print start, end for s in cluster.get_all_services(): if s.type == 'IMPALA': impala = s q = impala.get_impala_queries(start_time=start, end_time=end, filter_str='database=xxx') for i in q.queries: print i.statement if name == 'main': try: cm_host = 'xx' api = ApiResource(cm_host, username='reader', password='cmreader', version=6) clusterName = api.get_all_clusters()[0] impala_query(clusterName)

[image: image] https://user-images.githubusercontent.com/14147011/39104190-bf8489f2-46e1-11e8-9041-11e8ba1406e6.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cloudera/cm_api/issues/68#issuecomment-383437751, or mute the thread https://github.com/notifications/unsubscribe-auth/ATRCidza9SCuO5RSlOB8yt_eND3MK2V1ks5trT1-gaJpZM4TdKZ6 .