As we're updating the llama-index library version to use their newest features (pipelines, docstore, etc), we're hitting an error that is
Error!!!: Too old airflow version.
This error is being raised because docker cannot run the gosu command to get the airflow version therefore it is raising error. Looking at the logs it seems it is raising because of the sqlalchemy version of apache airflow should be <=1.4.49 and the version for us to use the newest llama-index is greater than 2.0. In this case airflow service cannot come up and is raising this error.
To resolve this error we need to migrate to another vector database that is not very dependent on sqlalchemy version.
Researching about it, we found out that our best alternative is Qdrant database which supports async + metadata filtering (ref: QDrant features
To update our systems to use the Qdrant database we have the following tasks
[x] Update docker-compose.yaml and docker-compose.test.yaml to use a stable version of qdrant database instead of pgvector
As we're updating the llama-index library version to use their newest features (pipelines, docstore, etc), we're hitting an error that is
This error is being raised because docker cannot run the
gosu
command to get the airflow version therefore it is raising error. Looking at the logs it seems it is raising because of the sqlalchemy version of apache airflow should be<=1.4.49
and the version for us to use the newest llama-index is greater than2.0
. In this case airflow service cannot come up and is raising this error.To resolve this error we need to migrate to another vector database that is not very dependent on sqlalchemy version.
Researching about it, we found out that our best alternative is Qdrant database which supports async + metadata filtering (ref: QDrant features
To update our systems to use the Qdrant database we have the following tasks
docker-compose.yaml
anddocker-compose.test.yaml
to use a stable version of qdrant database instead of pgvectordiscourse_vector_store
ETL to assign a unique value to each document *discourse_summary_vector_store
ETL to assign a unique value to each document *discourse_vector_store
ETL to use the CustomIngestionPipelinediscourse_summary_vector_store
ETL to use the CustomIngestionPipelinegithub_vector_store
ETL to assign a unique id to each document *github_vector_store
ETL to use the CustomIngestionPipelineNote *: IDs should be the same across multiple runs. This is because the docstore could check for duplicated or updated nodes.