JarodMica / Vivy

156 stars 53 forks source link

Long Term Memory #6

Closed jidun67 closed 1 year ago

jidun67 commented 1 year ago

I saw you were looking into local long term memory. I like that also. and it seems that vector db's are best for the AI's I am sure you are fully aware of that. I brainstormed with chatGPT about it I asked if it were possible that Django's ORM might be a vector db, of course it's not BUT it said Scipy or Numpy could be used with the ORM to generate vector style returns from it I am not of the mind to know if that is so or not just looking. Currently I have been experimenting with different prompt loading running memories as it is annoying that most of these out of the box projects have none.

JarodMica commented 1 year ago

I think long-term memories is one of the harder problems to solve right now, I have seen some implementations that use pinecone for this and add additional functionality with langchain, but I am still looking into those and haven't watched enough youtube videos on pinecone lol. I personally don't mind the idea of pinecone, but it's an online database so that kinda turns me off of the idea as I want to try and store everything as locally as possible. I have been looking into vector embeddings and think openAI's Ada is a very affordable option to convert text to vectors, but I have a lingering sense of hesitation towards thinking that vector embeddings and cosine similiarity will get us to our end goal. Definitely think it's a piece of the puzzle, but not the end goal. I'm wondering if there has to be some type of decision making agent that acts as the middle-man towards memory retrieval, something that is semantically able to process what is being said and accurately retrieve it. One guy that I follow, his name is David Shapiro, has a ton of great content on this topic and seems to be a pioneer in trying to get the conversation going on this topic and a lot of people in the community are trying to find different ways to implement this.

jidun67 commented 1 year ago

Weviate might be a good open source local alternative https://sourceforge.net/software/product/Weaviate/ for genuine vector db.

jidun67 commented 1 year ago

But... I checked again this is intriguing from gpt3.5---

it's possible to use NumPy or SciPy with Django's ORM to perform database operations on vector data. NumPy and SciPy are powerful Python libraries for working with numerical data, and they support many mathematical and statistical operations on arrays and matrices.

One way to use NumPy or SciPy with Django's ORM is to define a custom field type that stores vector data as a NumPy array or SciPy sparse matrix. For example, let's say you have a database table with a "vector" column that stores a vector of floating-point values. You can define a custom field type that converts the raw database data to a NumPy array, and defines methods for performing vector operations on the data. Here's an example:

Copy import numpy as np from django.db import models

class VectorField(models.Field): def init(self, *args, *kwargs): kwargs['editable'] = False super().init(args, **kwargs)

def from_db_value(self, value, expression, connection):
    if value is None:
        return value
    return np.frombuffer(value, dtype=np.float32)

def to_python(self, value):
    if isinstance(value, np.ndarray):
        return value
    return self.from_db_value(value)

def get_db_prep_save(self, value, connection):
    if value is None:
        return value
    return value.tobytes()

In this example, we define a custom field type called VectorField that stores vector data as a NumPy array. The from_db_value method converts the raw database data to a NumPy array, and the to_python method converts the field value to a NumPy array if it's not already one. The get_db_prep_save method serializes the NumPy array to bytes for saving in the database.

Once you have defined your custom field type, you can use the standard Django ORM methods to query and manipulate the vector data. For example, you can use NumPy or SciPy functions to perform vector operations on the data. Here's an example query that finds the sum of all vectors in the table:

Copy import numpy as np from myapp.models import MyModel

Find the sum of all vectors

vectors = MyModel.objects.all().values_list('vector', flat=True) sum_vector = np.sum(vectors, axis=0) In this example, we use the values_list method to retrieve all values of the "vector" field, and convert them to a NumPy array using np.sum. You can customize the vector operations as needed to fit your data model and application logic.

jidun67 commented 1 year ago

I said: How could this replace a vector db for long term memory of a chat bot

model: gpt-3.5-turbo, word count: 15, token estimate: 15

And it said...

ASSISTANT Using NumPy or SciPy with Django's ORM can provide a more efficient way to store and manipulate vector data than using a traditional SQL database. If you're building a chatbot with long-term memory, you can use such a custom field type that uses NumPy arrays or SciPy matrices, to store the vector data representing the chatbot's knowledge or memory.

These vectors could be generated through sentence embeddings of the chat messages, where each message is represented using a high-dimensional vector. You can perform various standard vector operations such as vector addition, cosine similarity, etc., to implement your chatbot's memory and behavior.

By using NumPy or SciPy with Django's ORM, you can leverage the performance and scalability of these libraries to efficiently store and manipulate large collections of vectors. This could help with reducing the storage requirements and improving the query time for the database in the long term.

In addition, if you want to go a step further, you can use a vector database like Elasticsearch to store and query the vector data more efficiently. Elasticsearch is built for fast full-text search and also supports vector similarity search. Using such a combination of SciPy and Elasticsearch can lead to powerful search capabilities for applications that deal with high-dimensional vectors.

Overall, using NumPy or SciPy with Django's ORM can provide a more efficient way to store vector data for a chatbot's long term memory. By leveraging these powerful Python libraries, you can build more sophisticated chatbots and improve their performance over time.

jidun67 commented 1 year ago

I think I see now. Even if it is possible it doesn't make sense as the open source weviate one can be kept locally as well :)

JarodMica commented 1 year ago

I think I see now. Even if it is possible it doesn't make sense as the open source weviate one can be kept locally as well :)

Well, if you have any break throughs, always feel free to share em!