Azure-Samples / rag-postgres-openai-python

A RAG app to ask questions about rows in a database table. Deployable on Azure Container Apps with PostgreSQL Flexible Server.
MIT License
211 stars 78 forks source link

Improve and tests #55

Closed john0isaac closed 2 months ago

john0isaac commented 3 months ago

Purpose

This PR sets adds unit tests for the python codebase with test coverage 84%. It also adds pedantic types for all api responses that follows the Microsoft Chat Protocol. It also fixes all of the typing errors, adds Mypy and improves the workflow for code quality analysis.

image

Does this introduce a breaking change?

[ ] Yes
[x] No

Type of change

[ ] Bugfix
[ ] Feature
[x] Code style update (formatting, local variables)
[x] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

john0isaac commented 3 months ago

T̶h̶e̶ ̶t̶e̶s̶t̶ ̶s̶e̶t̶u̶p̶ ̶f̶o̶r̶ ̶t̶h̶e̶ ̶d̶a̶t̶a̶b̶a̶s̶e̶ ̶s̶t̶i̶l̶l̶ ̶n̶e̶e̶d̶s̶ ̶w̶o̶r̶k̶

@pamelafox, I finished working on the tests. Here is the the run on my repo: https://github.com/john0isaac/rag-postgres-openai-python/actions/runs/9813187237/job/27098703861?pr=2

large-mac platforms are only not available in my plan that's they are failing.

I ended up using some mocks from the openai-search-demo repo. I think the 81% test coverage is good for now. I will be working on the streaming now, and maybe after that playwright tests.

pamelafox commented 2 months ago

Thanks for the PR! Unfortunately been too busy with conferences and streams, but I will take a look as soon as I get a chance. I'm excited about all the improvements!

pamelafox commented 2 months ago

@john0isaac

I made a few typing improvements. Mypy seems to be confused by the editable app install, and I'm not sure how to make it happy.

There's also an issue with SQLAlchemy on these lines:

sql = text(fulltext_query).columns(id=Integer, rank=Integer)

I'm not sure what the issue is there. We could punt on making mypy happy, as I said in a comment, or we could spend a little more time trying to get the rest of the errors away, possibly via configuring what to ignore.

john0isaac commented 2 months ago

Let me take a look and see what I can do. I will let you know what I reach here.

pamelafox commented 2 months ago

I asked about SQLAlchemy issue here: https://github.com/sqlalchemy/sqlalchemy/discussions/11600

pamelafox commented 2 months ago

Asked pgvector for type annotation support here: https://github.com/pgvector/pgvector-python/issues/82 I think we should ignore pgvector in pyproject.toml for now.

pamelafox commented 2 months ago

The other issue is: "tests/conftest.py:225: error: Cannot instantiate abstract class "MockAzureCredential" with abstract attributes "aexit" and "close" [abstract]"

I'm surprised to see that error as that's the same mock as azure-search-openai-demo, and I don't see the same error there.

john0isaac commented 2 months ago

I made a few typing improvements. Mypy seems to be confused by the editable app install, and I'm not sure how to make it happy.

fixed here https://github.com/Azure-Samples/rag-postgres-openai-python/pull/55/commits/addd3dafe6fadc4c299884f5930bf14c8d70b475

john0isaac commented 2 months ago

The other issue is: "tests/conftest.py:225: error: Cannot instantiate abstract class "MockAzureCredential" with abstract attributes "aexit" and "close" [abstract]"

I'm surprised to see that error as that's the same mock as azure-search-openai-demo, and I don't see the same error there.

AsyncTokenCredential is a Protocol You just had to reimplement the functions in its children without changing anything lol. fixed here https://github.com/Azure-Samples/rag-postgres-openai-python/pull/55/commits/d0a9a02b1983d0d3b5b946c43372522f16d82e43

john0isaac commented 2 months ago

Asked pgvector for type annotation support here: pgvector/pgvector-python#82 I think we should ignore pgvector in pyproject.toml for now.

Done here https://github.com/Azure-Samples/rag-postgres-openai-python/pull/55/commits/39abb0fa8c7fede9987843eeacd9a1870bfe15c0

john0isaac commented 2 months ago

I asked about SQLAlchemy issue here: sqlalchemy/sqlalchemy#11600

In the sqlalchmey docs, there are two ways of doing this:

First Option / Passes mypy I tested it and pushed it here https://github.com/Azure-Samples/rag-postgres-openai-python/pull/55/commits/121b3417c509ff41f6aa893bf3bf348e40d5d4f9

stmt = text("SELECT id, name, timestamp FROM some_table")
stmt = stmt.columns(
            column('id', Integer),
            column('name', Unicode),
            column('timestamp', DateTime)
        )

for id, name, timestamp in connection.execute(stmt):
    print(id, name, timestamp)

Second Option / Original code - not working with mypy

stmt = text("SELECT id, name, timestamp FROM some_table")
stmt = stmt.columns(
            id=Integer,
            name=Unicode,
            timestamp=DateTime
        )

for id, name, timestamp in connection.execute(stmt):
    print(id, name, timestamp)

I think this may be a valid issue that needs to be solved by the SQLAlchemy team.

john0isaac commented 2 months ago

this https://github.com/Azure-Samples/rag-postgres-openai-python/pull/55/commits/5c17e9a404e772a40d24562c34aeb52f97441d13 fixes all of the errors related to mypy and the global_storage, it also follow the FastAPI way for creating application dependencies, the tests are faster now (I don't know why lol) maybe because I moved all of the code from the app startup 🤔

I created the routes folder to remove the noqa and to also follow the FastAPI app structure the rest of the files in the fastapi_app folder needs to be organized in folders too but I didn't want to make that change here too as it's not gonna affect anything.

I'm working on the last mypy error and will add mypy to the workflow run after I fix it.

john0isaac commented 2 months ago

@pamelafox I did it 🎉 All the errors are fixed and we are not using any type: ignore, tests are passing, app is working. I also added mypy to the app_tests workflow as it requires a full installation of the packages (can't be added with ruff)

pamelafox commented 2 months ago

@john0isaac Nice work making mypy happy, that's a tall feat. I have a few questions about possibly redundant calls.

john0isaac commented 2 months ago

There is only one issue left.. the engine is being created with each db_session which is being created with each request. It's not affecting the performance much but I wanted to raise it to you. I will work on it tomorrow, maybe in another PR if you merged this one or here.

I tried solving this and I did it the same way you'd create an engine with FastAPI. (they just define it as a variable not inside a function in the dependencies folder) saying that our engine creation function is an async func which can't be used outside an event loop (tried workarounds for that using asyncio but nothing worked) finally, I changed all of the function related to dealing with Postgres to become normal not async and everything worked. But all of the tests failed 🤣 I tried to do a bunch of stuff for the tests to fix it but I came across lots of async errors and even when it worked there were some dangling background tasks for tests (tests were randomly passing and the tests that don't get a db session I think were failing)

pamelafox commented 2 months ago

Ah, I see, thanks for raising that. It'd be best if we can reuse a session, like Flask-SQLAlchemy-Lite does: https://flask-sqlalchemy-lite.readthedocs.io/en/latest/session/ I do think it's less obvious how to architect app globals in FastAPI apps than in Quart/Flask apps, it's been interesting.

We're not in a rush to merge this, so I'll keep it open.

john0isaac commented 2 months ago

Sorry I misunderstood you, I will look into db sessions too but the app sessions is a no no. session won't work, that was my first go to. only accessible from request.session.something which is gonna be only available at the endpoint, I also looked at the globals for them and they have something called app.state which you can store stuff inside on the application level but it won't be initialized until the app get initialized which means that we can't use it in creating sessions.

After reconsidering everything I'm thinking of reusing the lifespan and globals approach for application wide global vars as it will work with the dependency I think. That is most likely what I will do if I didn't find a solution for the engine issue. I will keep you posted here.

john0isaac commented 2 months ago

Done ✅ I combined between the app.states and the dependencies and it looks amazing, everything is now created once in the lifespan then accessed from there. It's looking great! I'm satisfied with this now. 😄 I also created the item data model that matches the db table (db table class is not usable as it's not a valid pydantic type) this now types the whole api.

image

All of the endpoints have example responses now.

pamelafox commented 2 months ago

Cool! I just learnt about Depends. Seems like a nice pattern for FastAPI.