biocypher / biochatter-next

The advanced implementation for BioChatter, using Next.js
MIT License
8 stars 7 forks source link

Question: how do we respond to changes in the vector database that the frontend is unaware of? #30

Closed slobentanzer closed 8 months ago

slobentanzer commented 10 months ago

If a user manually deletes an embedding or the entire database (e.g. by removing the local Docker volume), does the manuscript list stored in the frontend get updated?

fengsh27 commented 8 months ago

Yes, every time rag page is opened or 'reconnect' button is clicked, it will update documents.

slobentanzer commented 8 months ago

Thanks! Does that mean that

  1. if I delete a document ("offline") that was added through the web page, it will disappear on the web page? And
  2. if I add a document ("offline"), the web page will include that additional document?

I can't imagine the second option works, because the offline edit would have to know the connection args from the browser, or do I misunderstand the process?

In other words, is it possible to maintain a large document database (for an entire group or project, for instance), and then automatically use it just by connecting to the correct IP from the RAG settings?

fengsh27 commented 8 months ago

@slobentanzer I don't get your point. The current rag process of adding document is like this: 1). users upload a document, 2). biochatter embeds the document and saves it to vector database 3). return doc id to frontend 4). frontend adds doc id to its doc workspace, and update document list with its doc workspace

Removing document: 1). frontend raises delete request to biochatter with doc id 2). biochatter removes document from vector database 3). return status code to frontend 4). frontend remove doc id from its doc workspace, and update its document list with the doc workspace

If this doesn't answer your question or you have further concerns, please let me know.

slobentanzer commented 8 months ago

@fengsh27 what you describe is fully frontend-based, right? I am talking about a database that is maintained via other means (maybe Milvus CLI, Python, etc). Since the user has the ability to connect another database in the settings by entering its IP address, that is possible, right?

My question is now, how does BioChatter Next handle this case? Will it see the documents that already exist in a Milvus DB which is connected to the frontend by entering an IP?

This is for use cases where a researcher or group would like to maintain a consistent library of embedded documents for a specific purpose, which can remain active for as long as necessary without depending on re-embedding the documents. Does this only work via the Next frontend, or can the Milvus DB be created and maintained by other means?

fengsh27 commented 8 months ago

I got your point. My original thought was, for our vector database ("local"), users can only view the documents uploaded from frontend. In contrast, for users' own database (connected by IP), users would have the capability to view all documents in the database. Or further, we can provide a option for user to view all documents within their own database.

What's your idea?

slobentanzer commented 8 months ago

My original thought was, for our vector database ("local"), users can only view the documents uploaded from frontend.

Yes, and that is still a good starting point of showcasing the use. My point was just to start thinking about the other use cases. :)

slobentanzer commented 8 months ago

This issue was just for me to better understand how you have designed the current version, to be able to find the best way forward with these other use cases. We can close the issue and open a new one once we decide if and how to tackle that.