BharatSahAIyak / kisai-bot

0 stars 0 forks source link

Moving to BGE embeddings #267

Open Gautam-Rajeev opened 4 months ago

Gautam-Rajeev commented 4 months ago

Current state:

OpenAI embedding take too long to respond (2-3 seconds sometimes)

Expected Behaviour

All docs stored as BGE-small vectors and self hosted BGE-small used to retrieve from document service (and any required document service)

Implementation details:

Migration of older data

Adding embedding

Testing

techsavvyash commented 4 months ago

@techsavvyash to follow up with @sooraj1002 on it's progress. Needed for RAJAI

sooraj1002 commented 4 months ago

Development for this migration being tracked here as a first step. This ticket will solve the requirement for RAJAI datasets

KDwevedi commented 4 months ago

@GautamR-Samagra Both Dataset and Document Service have been enabled to allow for BGE embedding generation and BGE retrieval for various tables, can do migration whenever tables are specified.

techsavvyash commented 4 months ago
sooraj1002 commented 4 months ago

move all these changes to stage as well @sooraj1002

@PriyanshiATSamagra is currently testing admin on stage. can deploy on stage and make changes post approval from her

sooraj1002 commented 3 months ago

@PriyanshiATSamagra curl to test:

curl --location 'https://dataset-service.dev.bhasai.samagra.io/v2/search' \
--header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \
--header 'botId: b7b52269-1233-46f5-a5da-1d4d9e2eb395' \
--header 'Content-Type: application/json' \
--data '{
    "type": "vector",
    "query": "farmers",
    "parameters": {
        "algorithm": "text-embedding-ada-002",
        "topK": 10,
        "datasetId": "013f06b4-cb90-41b5-a000-1644ba65f0bb",
        "searchColumn": "Data",
        "threshold": 0.1    
    }
}'

To search using bge instead:

"algorithm": "bge"

We have to verify if algorithm bge and text-embedding-ada-002 give the same result (similarity score might be lower and threshold might also need to be modified)

PriyanshiSoni11 commented 3 months ago

@sooraj1002 @KDwevedi

curl --location 'https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search' \
--header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \
--header 'botId: a899de5a-17ec-4f2f-aa30-aa26ad74b6b4' \
--header 'Content-Type: application/json' \
--data '{
    "type": "vector",
    "query": "Farm",
    "parameters": {
        "algorithm": "bge",
        "topK": 10,
        "datasetId": "b1a4bacd-0e2d-47dd-bdc8-f127ba754ced",
        "searchColumn": "Scheme",
        "threshold": 0.1
    }
}'

This request is not giving any response getting "Error: connect ENETUNREACH 4.213.167.136:443" Verified values of botId, orgId, datasetId, searchColumn and baseURL tried "text-embedding-ada-002" and "bge" both got same error, Can you check please

sooraj1002 commented 3 months ago

Might have been a temporary error. unable to reproduce now @PriyanshiATSamagra

PriyanshiSoni11 commented 3 months ago

@sooraj1002 Tried all fields available for that provided dataset it gives " is not a vector field"

Image

sooraj1002 commented 3 months ago

It's not necessary for a there to be a vector field in each dataset.

PriyanshiSoni11 commented 3 months ago

It's not necessary for a there to be a vector field in each dataset.

@sooraj1002 Can you share a dataset on which I can test this algorithm.

PriyanshiSoni11 commented 3 months ago

@sooraj1002 @KDwevedi Please help here

PriyanshiSoni11 commented 3 months ago

@sooraj1002 @KDwevedi Please help here when can we connect ?

KDwevedi commented 3 months ago

Sent an invite

chinmoy12c commented 3 months ago

BGE embeddings were not working after the migration. A setup created on playground owner to test things out.

KDwevedi commented 3 months ago

@PriyanshiATSamagra, I've created a testing dataset for QA There are 2 vector-searchable columns question and answer, both are valid values for searchColumn algorithm can be text-embedding-ada-002 for OPENAI based embeddings or bge for bge based embeddings threshold means results with a similarity score less than that will be filtered out

curl -X POST https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search \
-H "orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef" \
-H "botId: 0c09de45-df1a-4dee-8486-3904a5f51611" \
-H "Content-Type: application/json" \
-d '{
  "type": "vector",
  "query": "queue",
  "parameters": {
    "algorithm": "text-embedding-ada-002",
    "threshold": 0.01,
    "topK": 10,
    "searchColumn": "question",
    "datasetId": "666f0c22-688e-41c4-8e95-5bd481a8c3d7"
  }
}'
PriyanshiSoni11 commented 2 months ago

curl --location 'https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search' \ --header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \ --header 'botId: 0c09de45-df1a-4dee-8486-3904a5f51611' \ --header 'Content-Type: application/json' \ --data '{ "type": "vector", "query": "lunch", "parameters": { "algorithm": "bge", "threshold": 0.01, "topK": 10, "searchColumn": "question", "datasetId": "666f0c22-688e-41c4-8e95-5bd481a8c3d7" } }'

7 QnAs { "data": [ { "id": "0ca98232-4c9e-4cdf-b576-1b1b9368572c", "question": "Where will I get lunch and dinner?", "answer": "Within the Kumbh Mela area, you will find a variety of food stalls and eateries offering a wide range of dining options, including traditional Indian cuisine, vegetarian meals, and snacks. These food outlets are spread across the mela grounds to ensure easy access for all attendees. Many of them follow strict hygiene standards to ensure food safety. ", "embeddingSimilarity": 0.34446332735561536 }, { "id": "ea9d5ae7-29c6-42ee-b94e-c22dff2d8bae", "question": "How to reach Kumbh from my city?", "answer": "You can reach Kumbh by various modes of transport, including flights, trains, and buses. Kumbh Sah'AI'yak can provide detailed travel options and routes based on your location.", "embeddingSimilarity": 0.08537691140462567 }, { "id": "7a3a50a5-974a-42d9-983f-47cb70423c15", "question": "Me and my family would like to stay for an extended duration- can we cook our own meals?", "answer": "Yes, some accommodations at the Kumbh Mela provide facilities for cooking your own meals, especially in family tents and extended stay areas. These options often include basic kitchen setups with necessary utensils and cooking equipment. Additionally, guidelines for safe cooking practices are provided to ensure safety and hygiene.", "embeddingSimilarity": 0.07682332916340717 }, { "id": "cf2602d6-b79c-4648-924a-5a490cebdcec", "question": "What are the commute facilities to reach there?", "answer": "Various commute facilities, including auto-rickshaws, buses, and cabs, are available to reach different locations within the Kumbh Mela area. Kumbh Sah'AI'yak can guide you on the best options.", "embeddingSimilarity": 0.07165499253938745 }, { "id": "66f21489-9217-42ef-bb83-20385b026552", "question": "Is bottled water the only option?", "answer": "Bottled water is readily available throughout the Kumbh Mela area. Additionally, there are designated water stations that provide safe and filtered drinking water. These stations are strategically placed for easy access, ensuring that all attendees stay hydrated.", "embeddingSimilarity": 0.039908742970416355 }, { "id": "a2000e50-6bda-44ac-8a8e-62d0c540d66c", "question": "What child-suitable facilities/activities exist?", "answer": "You can find designated play areas, child-friendly activities, and special amenities to ensure a safe and enjoyable experience for children at the Kumbh Mela.", "embeddingSimilarity": 0.025705086290444545 }, { "id": "7ca88de1-4345-45bb-83a5-7dcca73671b8", "question": "What are the important dates and events there?", "answer": "The important dates and events include various religious ceremonies, cultural performances, and key bathing dates (snan) during the Kumbh Mela. A detailed schedule will be available closer to the event.", "embeddingSimilarity": 0.02435364017738495 } ] }

curl --location 'https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search' \ --header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \ --header 'botId: 0c09de45-df1a-4dee-8486-3904a5f51611' \ --header 'Content-Type: application/json' \ --data '{ "type": "vector", "query": "lunch", "parameters": { "algorithm": "text-embedding-ada-002", "threshold": 0.01, "topK": 10, "searchColumn": "question", "datasetId": "666f0c22-688e-41c4-8e95-5bd481a8c3d7" } }'

10 QnAs { "data": [ { "id": "0ca98232-4c9e-4cdf-b576-1b1b9368572c", "question": "Where will I get lunch and dinner?", "answer": "Within the Kumbh Mela area, you will find a variety of food stalls and eateries offering a wide range of dining options, including traditional Indian cuisine, vegetarian meals, and snacks. These food outlets are spread across the mela grounds to ensure easy access for all attendees. Many of them follow strict hygiene standards to ensure food safety. ", "embeddingSimilarity": 0.43154116317755287 }, { "id": "7a3a50a5-974a-42d9-983f-47cb70423c15", "question": "Me and my family would like to stay for an extended duration- can we cook our own meals?", "answer": "Yes, some accommodations at the Kumbh Mela provide facilities for cooking your own meals, especially in family tents and extended stay areas. These options often include basic kitchen setups with necessary utensils and cooking equipment. Additionally, guidelines for safe cooking practices are provided to ensure safety and hygiene.", "embeddingSimilarity": 0.3161537418788881 }, { "id": "cf2602d6-b79c-4648-924a-5a490cebdcec", "question": "What are the commute facilities to reach there?", "answer": "Various commute facilities, including auto-rickshaws, buses, and cabs, are available to reach different locations within the Kumbh Mela area. Kumbh Sah'AI'yak can guide you on the best options.", "embeddingSimilarity": 0.3067903707244716 }, { "id": "196c31e9-63e3-42f4-a002-7caefe567e1f", "question": "What are the provisions for the senior citizens at the mela area?", "answer": "Provisions for senior citizens include priority access, dedicated seating, and assistance services. Kumbh Sah'AI'yak can provide detailed information based on specific needs.", "embeddingSimilarity": 0.3035538945466547 }, { "id": "cdc5e78b-6375-465f-8075-a0a8c1d96203", "question": "What are the timings of major events/festivities?", "answer": "The timings of major events and festivities will be provided in the schedule closer to the event. Kumbh Sah'AI'yak can help you stay updated on the event timings.", "embeddingSimilarity": 0.29969752239298975 }, { "id": "7ca88de1-4345-45bb-83a5-7dcca73671b8", "question": "What are the important dates and events there?", "answer": "The important dates and events include various religious ceremonies, cultural performances, and key bathing dates (snan) during the Kumbh Mela. A detailed schedule will be available closer to the event.", "embeddingSimilarity": 0.2917639948823505 }, { "id": "06db6215-72f7-4317-a67e-7cc9881de35a", "question": "Are there any special provisions for senior citizens?", "answer": "Yes, there are special provisions for senior citizens, including priority access to certain areas, dedicated seating, and assistance services to ensure their comfort and safety during the event.", "embeddingSimilarity": 0.2856387740267454 }, { "id": "1088b68c-33ec-45e4-ba63-91dfcad3bad5", "question": "Should I book flight or train journeys are more convenient?", "answer": "The choice between flight and train depends on your budget, convenience, and availability. Kumbh Sah'AI'yak can help you compare options and make the best choice for your journey.", "embeddingSimilarity": 0.28216980620067644 }, { "id": "a2000e50-6bda-44ac-8a8e-62d0c540d66c", "question": "What child-suitable facilities/activities exist?", "answer": "You can find designated play areas, child-friendly activities, and special amenities to ensure a safe and enjoyable experience for children at the Kumbh Mela.", "embeddingSimilarity": 0.28201500513566313 }, { "id": "0b06386b-34b8-4f16-89d8-72a2d8c42c7c", "question": "If I am traveling by car and staying in a tent will there be parking facility near my stay?", "answer": "Yes, there will be parking facilities near tent areas. Kumbh Sah'AI'yak can provide details on parking locations and availability.", "embeddingSimilarity": 0.2816695318183302 } ] }

@KDwevedi with provided curl, I checked text-embedding-ada-002 and bge. Expectation is to give same response, But here in above data with Bge algo got 7 QnAs and where as with text-embedding-ada-002 received 10 QnAs . Please check why this happening.

KDwevedi commented 2 months ago

@KDwevedi @sooraj1002 to take this up as a bug report 7 and 10 items returns counts should not be disparate for extremely low threshold