GNS-Science / nshm-toshi-api

An extensible API where task metadata, and important input and output files relating to data-intensive science processes are retained. Custom task schemas can be defined to support their meta-data needs.
GNU Affero General Public License v3.0
0 stars 0 forks source link

Feature/Elasticsearch helper script #195

Closed chrisbc closed 12 months ago

chrisbc commented 12 months ago

Elasticseacrh indexes can help us with analysis of problems / metris etc and the elasticsearch-dsl makes building and running queries fast .

We want to add a simple cli script that we can build in useful queries and make these easily rerunnable. example query below....

import os
os.environ.setdefault("ANYSEARCH_PREFERRED_BACKEND", "Elasticsearch")
from nzshm_model_graphql_api.config import ES_HOST
from elasticsearch_dsl import Search, connections, Q

connections.create_connection(hosts=ES_HOST)

s = Search()
q1 = ~Q("term", meta__k="hazard_agg_target")
q2 = Q("term", clazz_name__keyword="OpenquakeHazardSolution")
s = s.query( q2 & q1)

# aggregate by month ...
s.aggs.bucket('solutions_per_month', 'date_histogram', field='created', interval='month')

s = s.extra(explain=True, track_total_hits=True)

print('query')
print(s.to_dict())

s = s[:2]

response = s.execute()
print(f'Total {response.hits.total.value} hits found.' )

print( response.aggregations)
for bucket in response.aggregations.solutions_per_month.buckets:

    print(bucket.key_as_string, bucket.doc_count)
chrisbc commented 12 months ago

closed by #202