druid-io / pydruid

A Python connector for Druid
Other
506 stars 194 forks source link

Intervals in raw_data #195

Open trakru opened 4 years ago

trakru commented 4 years ago

Not an issue per se, but more of a guide/community ask.

I was trying to grab data from the Druid broker, but apparently there was a 50,000 record limit for any request, so ended up creating a 4 hour grabInterval (outlined below) in a list and loop the request through the list to create the intervals required. Anybody else face the same issue/create a different solution using some of the other parameters?

Code reproduced below

grabInterval = '2020-01-20T00:00/2020-01-20T04:00'
raw_data = query.select(
            datasource='myDataSource', 
            granularity='all',
            intervals=grabInterval,
            paging_spec={'pagingIdentifies': {}, 'threshold': 10000},
            context={"timeout": 1000})