overall I really like eland, but I noticed that creating a pandas DataFrame is much slower with the eland_to_pandas() method compared to the "naive" way of doing multiple scan() calls on an elasticsearch_dsl query (about 4-5 times slower)
from elasticsearch_dsl import Search
s = Search(using=self.es_client, index=self.index)
df = pd.DataFrame((d.to_dict()) for d in s.scan())
>> Elapsed time: 33.50s
Is there any chance that the conversion to pandas could be accelerated?
Hey,
overall I really like eland, but I noticed that creating a pandas DataFrame is much slower with the
eland_to_pandas()
method compared to the "naive" way of doing multiplescan()
calls on anelasticsearch_dsl
query (about 4-5 times slower)Here is an example (~80.000 rows, 5 cols):
versus
Is there any chance that the conversion to pandas could be accelerated?