elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
27 stars 99 forks source link

Improve `es_info` method output readability #239

Open leonardbinet opened 4 years ago

leonardbinet commented 4 years ago

In my case, es_info method output readability is not optimal:

Capture d’écran 2020-07-16 à 20 10 23
sethmlarson commented 4 years ago

I'm wondering if it makes sense to have es_info display things more vertically rather than horizontally. There'd be a lot of output but at least you'd be able to see it all from a Jupyter Notebook. Something like:

es_index_pattern: ecommerce
Index:
 es_index_field: _id
 is_source_field: False
Operations:
  tasks: [('boolean_filter': ('boolean_filter': {'range': {'taxful_total_price': {'gt': 100.0}}})), ('head': ('sort_field': '_doc', 'count': 10))]
  size: 10
  sort_params: _doc:asc
  _source: [
    'taxful_total_price',
    'category',
  ]
  body: {
    "query": {
      "range": {
        "taxful_total_price": {
          "gt": 100.0
        }
      }
    }
  }
  post_processing: []
Mappings:
  taxful_total_price:
    aggregatable_es_field_name: 'taxful_total_price'
    es_date_format: None
    es_dtype: 'float'
    es_field_name: 'taxful_total_price'
    is_aggregatable: True
    is_scripted: False
    is_searchable: True
    is_source: True
    pd_dtype: 'float64'
  category:
    aggregatable_es_field_name: 'category.keyword'
    es_date_format: None
    es_dtype: 'text'
    es_field_name: 'category'
    is_aggregatable: False
    is_scripted: False
    is_searchable: True
    is_source: True
    pd_dtype: 'object'
...

Obviously there'd be a long tail for mappings on wide indices but hopefully there wouldn't be cutoff at least?

leonardbinet commented 4 years ago

@sethmlarson I prefer as well a solution that avoids cutoffs, even if it means a longer output. An additional improvement could be a html table in notebooks by implementing _repr_html_ method: it would avoid cutoff by providing horizontal scroll + would keep a compact output.

Would it makes sense in your opinion to "split" es_info in multiple methods? For instance es_mappings, es_operations etc ?

V1NAY8 commented 4 years ago

@sethmlarson I will try to do this by adding __repr_html__ method for Ipython readability. Is any formatting is required for console output?

NickolayVasilishin commented 3 years ago

I'd suggest having es_info method returning some EsInfo object, which consolidates all this useful information. As far as I understand, it's the only place where I can quickly get ES query, right? So having EsInfo object would allow accessing some specific parts like query, index, mapping etc as well as printing everything using __repr__ /__str__ methods.

sethmlarson commented 3 years ago

@NickolayVasilishin Agreed! Having an object is definitely preferred.