elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
18 stars 98 forks source link

Boolean Indexing on an object type #348

Open Romathonat opened 3 years ago

Romathonat commented 3 years ago

Hello,

How can I filter my dataset using standard pandas boolean indexing on an attribute that is detected as an object type ? (in the elastic mapping it is both indexed as "text" and "keyword"

Example:

df.dtypes

Results:

...
codeInstance             object
...
dtype: object
df['codeInstance'].head()

Results:

YCzDDnkBdtlmoJ_PA5Qq    instanceX
Name: codeInstance, dtype: object
df[df['codeInstance'] == 'instanceX'].head()

gives me 0 row. I tried adding ".keyword" but it does not work either.

th0ger commented 3 years ago

Same question here. I notice that:

df['codeInstance'] == 'instanceX'
> {'term': {'codeInstance': 'instanceX'}}

A workaround is:

query = {
          "query_string" : {
            "fields" : ["codeInstance"],
            "query" : "instanceX"
          }
        }
df.es_query(query)

But this is of course not compact Pandas style as we like it.