gbaeke / gpt-vectors

22 stars 6 forks source link

Issue with Feed Address Recognition & Querying Local Text Document Vectors in Pinecone #1

Open norsizu opened 1 year ago

norsizu commented 1 year ago

This Project has proven to be incredibly helpful and effective. However, I am encountering an issue with the feed address recognition feature. When I attempt to import my desired feed address, the system fails to recognize it, and I receive the following error message: "TypeError: expected string or bytes-like object, got 'NoneType'".

To resolve this problem and continue using the project effectively, I was wondering if there might be an alternative method for achieving my goal. Specifically, I am interested in uploading a batch of text document from my local machine as vectors to Pinecone and subsequently querying it. Could you please provide guidance on how to accomplish this task, or suggest any other solutions that may address the issue at hand?

Thank you for your time and support.

gbaeke commented 1 year ago

Hi, glad you find it useful. Feedparser only works with RSS feeds. To use local text documents, you can iterate over those files in the folder, read every file and create the embedding. Something like:

import os
import openai

# Set up the OpenAI API
openai.api_key = "your_api_key"

# Define a function to get embeddings for given text
def get_embedding(text):
   # create embedding here with OpenAI
   return embedding

# Define the folder path containing the text files
folder_path = "path/to/your/text/files"

# Iterate over all text files in the folder
for file_name in os.listdir(folder_path):
    if file_name.endswith(".txt"):
        file_path = os.path.join(folder_path, file_name)

        # Read the content of the file
        with open(file_path, "r", encoding="utf-8") as file:
            content = file.read()

        # Get the embedding using the OpenAI API
        embedding = get_embedding(content)

        # upload to pinecone
        ...
norsizu commented 1 year ago

Thank you so much, I will give this method a try.