astronomer / ask-astro

An end-to-end LLM reference implementation providing a Q&A interface for Airflow and Astronomer
https://ask.astronomer.io/
Apache License 2.0
192 stars 47 forks source link

generate_uuids fails if user has already generated uuids. #171

Closed mpgreg closed 9 months ago

mpgreg commented 10 months ago

We probably want generate_uuids() to pass through if the requested uuid_column already exists.

https://github.com/astronomer/ask-astro/blob/1067794a118c96b32a8400209f6a1322c71ffb88/airflow/include/tasks/extract/utils/weaviate/ask_astro_weaviate_hook.py#L153C9-L162C10


        if uuid_column in column_names:
            self.logger.info(
                f"Property {uuid_column} already in dataset. Not generating new UUIDs."
                )
        else:
            df[uuid_column] = df[column_subset].drop(
                columns=[vector_column], 
                inplace=False, 
                errors="ignore").apply(
                    lambda row: generate_uuid5(identifier=row.to_dict(), namespace=class_name), axis=1
                )```
sunank200 commented 10 months ago

I think this is already changed: https://github.com/astronomer/ask-astro/blob/main/airflow/include/tasks/extract/utils/weaviate/ask_astro_weaviate_hook.py#L148 This is code from main branch

sunank200 commented 9 months ago

I have tested it here and generate_uuid logic is fine.