AI-Northstar-Tech / vector-io

The only Vector tooling you'll need. Star the repo and look out for an email to try out a brand new Vector Data Exploration demo! Use the universal VDF format for vector datasets to easily export and import data from all vector databases, and re-embed it using any model
https://tryvector.io
Apache License 2.0
191 stars 25 forks source link

Sweep: The export from pinecone fails due to some data type error #105

Open abhishek-fluidai opened 1 month ago

abhishek-fluidai commented 1 month ago

Details

Fetching namespaces: 0% 0/1 [02:54<?, ?it/s] Error: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 89, in main run_export(span) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 149, in run_export export_obj = slug_to_export_func[args["vector_database"]](args) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 164, in export_vdb pinecone_export.get_data() File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 481, in get_data index_meta = self.get_data_for_index(index_name) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 575, in get_data_for_index total_size += self.save_vectors_to_parquet( File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/vdb_export_cls.py", line 87, in save_vectors_to_parquet df.to_parquet(parquet_file) File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 2970, in to_parquet return to_parquet( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 483, in to_parquet impl.write( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 189, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 624, in dataframe_to_arrays arrays[i] = maybe_fut.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 340, in pyarrow.lib.array File "pyarrow/array.pxi", line 86, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Exporting fluidaigpt-dev: 0% 0/1 [02:56<?, ?it/s] Final Step: Fetching vectors: 100% 14404/14404 [02:39<00:00, 90.24it/s]

Branch

No response

sweep-ai[bot] commented 1 month ago
Sweeping

0%
๐Ÿ’Ž Sweep Pro: You have unlimited Sweep issues

Actions


[!TIP] To recreate the pull request, edit the issue title or description.

greptile-apps[bot] commented 1 month ago

To resolve the data type error during export from Pinecone, modify the save_vectors_to_parquet method in /src/vdf_io/export_vdf/vdb_export_cls.py to ensure the created_at column is converted to a double. Add the following code before calling df.to_parquet(parquet_file):

if 'created_at' in df.columns:
    df['created_at'] = df['created_at'].astype(float)

This will convert the created_at column to the correct data type.

References

/.github/ISSUE_TEMPLATE /src/vdf_io/import_vdf/pinecone_import.py /src/vdf_io/export_vdf_cli.py /archive/example-VDF_META.json /src/vdf_io/import_vdf /src/vdf_io/notebooks/test_filtering_pc_log.ipynb /src/vdf_io/notebooks /src/vdf_io/import_vdf_cli.py /docs/export_vdf_pinecone_help.txt /.github/ISSUE_TEMPLATE/sweep-template.yml /.github/ISSUE_TEMPLATE/support-for-new-vector-db.md /src/vdf_io/export_vdf/vertexai_vector_search_export.py /.github /src/vdf_io /src/vdf_io/scripts/push_to_hub_vdf.py /docs /src /src/vdf_io/export_vdf/pinecone_export.py /archive /src/vdf_io/notebooks/kdbai_end_to_end_vectorIO.ipynb /README.md /src/vdf_io/export_vdf/vdb_export_cls.py

#### About Greptile This response provides a starting point for your research, not a precise solution. Help us improve! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. [Ask Greptile](https://app.greptile.com/chat/github/ai-northstar-tech/vector-io/main) ยท [Edit Issue Bot Settings](https://app.greptile.com/apps/github)
dhruv-anand-aintech commented 1 month ago

What type is the created at column in your original index?