Closed mpgreg closed 9 months ago
We probably want generate_uuids() to pass through if the requested uuid_column already exists.
generate_uuids()
https://github.com/astronomer/ask-astro/blob/1067794a118c96b32a8400209f6a1322c71ffb88/airflow/include/tasks/extract/utils/weaviate/ask_astro_weaviate_hook.py#L153C9-L162C10
if uuid_column in column_names: self.logger.info( f"Property {uuid_column} already in dataset. Not generating new UUIDs." ) else: df[uuid_column] = df[column_subset].drop( columns=[vector_column], inplace=False, errors="ignore").apply( lambda row: generate_uuid5(identifier=row.to_dict(), namespace=class_name), axis=1 )```
I think this is already changed: https://github.com/astronomer/ask-astro/blob/main/airflow/include/tasks/extract/utils/weaviate/ask_astro_weaviate_hook.py#L148 This is code from main branch
I have tested it here and generate_uuid logic is fine.
We probably want
generate_uuids()
to pass through if the requested uuid_column already exists.https://github.com/astronomer/ask-astro/blob/1067794a118c96b32a8400209f6a1322c71ffb88/airflow/include/tasks/extract/utils/weaviate/ask_astro_weaviate_hook.py#L153C9-L162C10