Closed nenb closed 11 months ago
Hey @nenb
Interesting project! Are you accepting contributions at the moment? If you are, I would be happy to contribute.
Yes, we do! See https://ragna.chat/en/stable/community/contribute/
I'm not sure of the roadmap, so I've included a couple of possible areas below based on a first-pass of the code. If any of these are of interest to you, I'd be happy to discuss further how best I can contribute in these areas. (And if not, I'm very open to other technical areas as well.)
Indeed, we don't have a roadmap yet. We put all the effort into having a proper first release for the PyData NYC talk last Friday. I'll give you my opinions on your suggestions below. But note that this potentially does not reflect the opinion of the team.
Embeddings The quality of the embeddings seems likely to be pretty important to the quality of the RAG app. At the moment, the embedding functionality appears to be more or less hardcoded to a single model. Are you planning on extending this eg users can specify their own embedding models?
Yes, this is planned for exactly the reason you stated. We currently have it hardcoded to a simple model, since the folks over at ChromaDB were nice enough to package it with minimal external dependencies.
Storage/Retrieval Separation Related to #71 and also to the last point on embeddings. There are some pretty cool (in my opinion) 'retrieval' technologies emerging like
lance
/pylance
(lancedb
is the much heavier solution that includes both embedding, querying and other features). Are you planning on separating out how the embeddings are computed versus how they are retrieved (theVectorDatabaseSourceStorage
class currently requires bothstore
andretrieve
to be specified as far as I can tell, it's not possible to mix and match)?
It is not planned, but this is something that I thought about a lot. My idea here was to let SourceStorage.store
take a list of embeddings rather than a list of documents. That way, we could make the embedding another component apart from SourceStorage
and Assistant
that users can tune.
The reason we didn't go with this solution so far, is that there are SourceStorages
that cannot take embeddings. One such example would be [AWS Kendra](). One needs to pass a document to it and Kendra will do the embedding behind the scenes. Meaning, we would need to have a conditional on each source storage, whether it supports embeddings as inputs or documents. And that has a long tail of technical decisions that we would have to make.
Not saying this is impossible, but it requires careful design. I don't want to compromise on UX here.
Multi-modal support Do you plan to add multi-modal support eg DALL-E? This is something that I see a lot at the moment, and there's definitely a wide spectrum for what can be offered (in terms of complexity).
No that is not planned. We built Ragna with the sole purpose of RAG in mind. If you go multimodal you again have an exponential growing list of options that you need to support. And even if we would have the budget for it, I can't think of consistent API / UI for it.
Plug-Ins Lots of AI APIs and vector databases, more to come with fine-tuned local models, I don't think it's possible to support everything directly anymore! Do you plan to add a plug-in system so that users can add our own custom models etc eg via Python package plugins and some interface that user models must implement?
Plugins are well supported! I acknowledge that it is abysmally documented right now. Just subclass SourceStorage
and Assistant
and implement whatever you like. As long as it adheres to the signatures of the abstract methods, it will work with Ragna.
For the Python API, you can pass your classes directly to the chat. Have a look at this notebook: https://github.com/Quansight/ragna/blob/main/examples/local_llm/local_llm.ipynb
For the REST API and UI, you need to add them to the source_storages =
and assistants =
list in the configuration. For example
[core]
# ...
source_storages = [
"ragna.source_storages.RagnaDemoSourceStorage",
"my_ragna_plugin.MySourceStorage",
]
assistants = [
"ragna.assistants.RagnaDemoAssistant",
"my_ragna_plugin.MyAssistant",
]
Make sure that that the my_ragna_plugin.py
Python module is on the PYTHONPATH
when starting the API / UI. Just as heads up: to meet the deadline, we hardcoded some assumptions about the source storages in the UI. This will be cleaned up for the next release, but might conflict with your custom classes now.
In conclusion, I think the most valuable contribution right now is helping us documenting all the features that we have. So use Ragna and tell us about all the rough edges you faced or use cases that you have, but are not supported yet. From that we could either have technical discussion about whether we want / can change something or we already have it and just need to document it.
@pmeier Thanks for a thorough response!
I started exploring/documenting the package last night. I'll post a longer response in this issue (unless there is a better location that you suggest) in the next 24-48 hours.
In the meantime, I forked the repo and created a branch with my initial setup. I tweaked some of the default values to be more user-friendly (in my extremely biased opinion :wink:) as I felt that they were getting in the way of me using ragna
. Would you be interested in considering any of them? Here is a brief description/motivation for each:
I stubbed out the authentication - seems like there are a bunch of issues already related to authentication, and I think it's distracting to the average user like me. It also doesn't play well with the Swagger UI. I've stubbed it out.
I changed the timeout for the assistants to at least 90s - interacting with OpenAI was regularly going over 30s for me (this is another issue that I noticed, I'll leave comments later) so most of my time with the assistants was just returning errors.
I replaced the wizard with 'sensible defaults' - I don't really like having to use a wizard before I start something, I'd much prefer just to have a bunch of 'sensible defaults' that mean I can get started straight away (and I think a lot of users might lose interest if they have to spend mental energy on thinking about init values etc, and might move on). I've tried to tweak the code so that if a TOML file does not exist, then it will just fall back to the default values given to pydantic
.
I changed the default queue implementation from 'memory' to 'file' - (I'm curious why 'file' was offered over sqlite
here?). The main reason I did this is that the 'memory' implementation is currently blocking (see here) and this was a bit of a frustrating experience for me when OpenAI was taking almost 60 seconds sometimes to respond. I'm not familiar enough with huey
to know if the in-memory implementation can also be adapted to work in async mode (probably not I guess), so I just changed the default setting to 'file' and things went much more smoothly.
I included OpenAI assistants as default - I think they are almost universal now, and I found the demo assistants a bit confusing to be honest. I didn't feel like they really illustrated the potential of ragna
(and again risked turning users away who just want something that works out of the box!).
I created a sample TOML file in examples
- the one in the docs appeared out of date (I'll push some issues/updates for the docs later!).
I included sqlite
as default database rather than memory - it's useful for debugging IMO.
@nenb Thanks for the thorough suggestions. I don't want to discuss them in one mega thread. I'll open issues for your individual points above and close this one afterwards. Feel free to comment there or create more issues if you feel I haven't captured everything you wanted to raise.
@nenb I think I opened issues for everything you raised. Feel free to call me out here if I missed something or just open an issue yourself.
Interesting project! Are you accepting contributions at the moment? If you are, I would be happy to contribute.
I'm not sure of the roadmap, so I've included a couple of possible areas below based on a first-pass of the code. If any of these are of interest to you, I'd be happy to discuss further how best I can contribute in these areas. (And if not, I'm very open to other technical areas as well.)
Embeddings The quality of the embeddings seems likely to be pretty important to the quality of the RAG app. At the moment, the embedding functionality appears to be more or less hardcoded to a single model. Are you planning on extending this eg users can specify their own embedding models?
Storage/Retrieval Separation Related to #71 and also to the last point on embeddings. There are some pretty cool (in my opinion) 'retrieval' technologies emerging like
lance
/pylance
(lancedb
is the much heavier solution that includes both embedding, querying and other features). Are you planning on separating out how the embeddings are computed versus how they are retrieved (theVectorDatabaseSourceStorage
class currently requires bothstore
andretrieve
to be specified as far as I can tell, it's not possible to mix and match)?Multi-modal support Do you plan to add multi-modal support eg DALL-E? This is something that I see a lot at the moment, and there's definitely a wide spectrum for what can be offered (in terms of complexity).
Plug-Ins Lots of AI APIs and vector databases, more to come with fine-tuned local models, I don't think it's possible to support everything directly anymore! Do you plan to add a plug-in system so that users can add our own custom models etc eg via Python package plugins and some interface that user models must implement?