elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
21 stars 99 forks source link

Fields (multi-fields?) specified in eland.Dataframe columns variable are not always returned #692

Closed bartbroere closed 5 months ago

bartbroere commented 6 months ago

I encountered the issue that a field specified in eland.Dataframe's columns parameter ended up as an empty array in the Dataframe (all NaNs), even though it definitely was not empty.

After setting some breakpoints in eland.operations I found a fix that works on our index. I did not yet reproduce it with the flights or ecommerce sample data yet, but I could work on that later.

bartbroere commented 6 months ago

I think this problem might occur with multi-fields specifically: https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

bartbroere commented 6 months ago

I think we could reproduce this issue if we add a copy_to to the testing indices: https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html

I'm trying that now.

pquentin commented 6 months ago

Thanks. Would you mind pasting here a JSON response from Elasticsearch where _source is a list and fields is set?