Open walkingmug opened 5 months ago
Thanks for your bug report! I liked how it was minimal and easy to reproduce locally, which allowed me to confirm the issue.
What happens is that the first append simply uploads data to a new index, while the second has to check the existing mappings, which hits a different code path. While we should not fail with a TypeError
, Eland does not currently support dense_vector
, which is the crux of the issue.
Description: When trying to append a pandas dataframe of type "dense_vector" to an existing elastic index with the same field type, an error occurs.
Reproduction:
pip install elasticsearch eland pandas numpy
vector3 = np.random.rand(512) vector4 = np.random.rand(512) df_2 = pd.DataFrame({ 'vector_column': [vector3, vector4] })
upload df_1 to elasticsearch
ed.pandas_to_eland( pd_df=df_1, es_client=client, es_dest_index='test-upload', es_if_exists="append", es_refresh=True, es_type_overrides={ "vector_column": { "type": "dense_vector", "dims": 512, "index": True, "similarity": "cosine" }, }, chunksize=100 )
upload df_2 to elasticsearch
ed.pandas_to_eland( pd_df=df_2, es_client=client, es_dest_index='test-upload', es_if_exists="append", es_refresh=True, es_type_overrides={ "vector_column": { "type": "dense_vector", "dims": 512, "index": True, "similarity": "cosine" }, }, chunksize=100 )
TypeError Traceback (most recent call last) in <cell line: 2>()
1 # upload df_2 to elasticsearch
----> 2 ed.pandas_to_eland(
3 pd_df=df_2,
4 es_client=client,
5 es_dest_index='test-upload',
1 frames /usr/local/lib/python3.10/dist-packages/eland/field_mappings.py in verify_mapping_compatibility(ed_mapping, es_mapping, es_type_overrides) 919 key_type = es_type_overrides.get(key, key_def["type"]) 920 es_key_type = es_props[key]["type"] --> 921 if key_type != es_key_type and es_key_type not in ES_COMPATIBLE_TYPES.get( 922 key_type, () 923 ):
TypeError: unhashable type: 'dict'