Open Gautam-Rajeev opened 4 months ago
@techsavvyash to follow up with @sooraj1002 on it's progress. Needed for RAJAI
Development for this migration being tracked here as a first step. This ticket will solve the requirement for RAJAI datasets
@GautamR-Samagra Both Dataset and Document Service have been enabled to allow for BGE embedding generation and BGE retrieval for various tables, can do migration whenever tables are specified.
move all these changes to stage as well @sooraj1002
@PriyanshiATSamagra is currently testing admin on stage. can deploy on stage and make changes post approval from her
@PriyanshiATSamagra curl to test:
curl --location 'https://dataset-service.dev.bhasai.samagra.io/v2/search' \
--header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \
--header 'botId: b7b52269-1233-46f5-a5da-1d4d9e2eb395' \
--header 'Content-Type: application/json' \
--data '{
"type": "vector",
"query": "farmers",
"parameters": {
"algorithm": "text-embedding-ada-002",
"topK": 10,
"datasetId": "013f06b4-cb90-41b5-a000-1644ba65f0bb",
"searchColumn": "Data",
"threshold": 0.1
}
}'
To search using bge
instead:
"algorithm": "bge"
We have to verify if algorithm bge and text-embedding-ada-002 give the same result (similarity score might be lower and threshold might also need to be modified)
@sooraj1002 @KDwevedi
curl --location 'https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search' \
--header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \
--header 'botId: a899de5a-17ec-4f2f-aa30-aa26ad74b6b4' \
--header 'Content-Type: application/json' \
--data '{
"type": "vector",
"query": "Farm",
"parameters": {
"algorithm": "bge",
"topK": 10,
"datasetId": "b1a4bacd-0e2d-47dd-bdc8-f127ba754ced",
"searchColumn": "Scheme",
"threshold": 0.1
}
}'
This request is not giving any response getting "Error: connect ENETUNREACH 4.213.167.136:443" Verified values of botId, orgId, datasetId, searchColumn and baseURL tried "text-embedding-ada-002" and "bge" both got same error, Can you check please
Might have been a temporary error. unable to reproduce now @PriyanshiATSamagra
@sooraj1002
Tried all fields available for that provided dataset it gives "
It's not necessary for a there to be a vector field in each dataset.
It's not necessary for a there to be a vector field in each dataset.
@sooraj1002 Can you share a dataset on which I can test this algorithm.
@sooraj1002 @KDwevedi Please help here
@sooraj1002 @KDwevedi Please help here when can we connect ?
Sent an invite
BGE embeddings were not working after the migration. A setup created on playground owner to test things out.
@PriyanshiATSamagra, I've created a testing dataset for QA
There are 2 vector-searchable columns question
and answer
, both are valid values for searchColumn
algorithm
can be text-embedding-ada-002
for OPENAI based embeddings or bge
for bge based embeddings
threshold
means results with a similarity score less than that will be filtered out
curl -X POST https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search \
-H "orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef" \
-H "botId: 0c09de45-df1a-4dee-8486-3904a5f51611" \
-H "Content-Type: application/json" \
-d '{
"type": "vector",
"query": "queue",
"parameters": {
"algorithm": "text-embedding-ada-002",
"threshold": 0.01,
"topK": 10,
"searchColumn": "question",
"datasetId": "666f0c22-688e-41c4-8e95-5bd481a8c3d7"
}
}'
curl --location 'https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search' \ --header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \ --header 'botId: 0c09de45-df1a-4dee-8486-3904a5f51611' \ --header 'Content-Type: application/json' \ --data '{ "type": "vector", "query": "lunch", "parameters": { "algorithm": "bge", "threshold": 0.01, "topK": 10, "searchColumn": "question", "datasetId": "666f0c22-688e-41c4-8e95-5bd481a8c3d7" } }'
7 QnAs
{ "data": [ { "id": "0ca98232-4c9e-4cdf-b576-1b1b9368572c", "question": "Where will I get lunch and dinner?", "answer": "Within the Kumbh Mela area, you will find a variety of food stalls and eateries offering a wide range of dining options, including traditional Indian cuisine, vegetarian meals, and snacks. These food outlets are spread across the mela grounds to ensure easy access for all attendees. Many of them follow strict hygiene standards to ensure food safety. ", "embeddingSimilarity": 0.34446332735561536 }, { "id": "ea9d5ae7-29c6-42ee-b94e-c22dff2d8bae", "question": "How to reach Kumbh from my city?", "answer": "You can reach Kumbh by various modes of transport, including flights, trains, and buses. Kumbh Sah'AI'yak can provide detailed travel options and routes based on your location.", "embeddingSimilarity": 0.08537691140462567 }, { "id": "7a3a50a5-974a-42d9-983f-47cb70423c15", "question": "Me and my family would like to stay for an extended duration- can we cook our own meals?", "answer": "Yes, some accommodations at the Kumbh Mela provide facilities for cooking your own meals, especially in family tents and extended stay areas. These options often include basic kitchen setups with necessary utensils and cooking equipment. Additionally, guidelines for safe cooking practices are provided to ensure safety and hygiene.", "embeddingSimilarity": 0.07682332916340717 }, { "id": "cf2602d6-b79c-4648-924a-5a490cebdcec", "question": "What are the commute facilities to reach there?", "answer": "Various commute facilities, including auto-rickshaws, buses, and cabs, are available to reach different locations within the Kumbh Mela area. Kumbh Sah'AI'yak can guide you on the best options.", "embeddingSimilarity": 0.07165499253938745 }, { "id": "66f21489-9217-42ef-bb83-20385b026552", "question": "Is bottled water the only option?", "answer": "Bottled water is readily available throughout the Kumbh Mela area. Additionally, there are designated water stations that provide safe and filtered drinking water. These stations are strategically placed for easy access, ensuring that all attendees stay hydrated.", "embeddingSimilarity": 0.039908742970416355 }, { "id": "a2000e50-6bda-44ac-8a8e-62d0c540d66c", "question": "What child-suitable facilities/activities exist?", "answer": "You can find designated play areas, child-friendly activities, and special amenities to ensure a safe and enjoyable experience for children at the Kumbh Mela.", "embeddingSimilarity": 0.025705086290444545 }, { "id": "7ca88de1-4345-45bb-83a5-7dcca73671b8", "question": "What are the important dates and events there?", "answer": "The important dates and events include various religious ceremonies, cultural performances, and key bathing dates (snan) during the Kumbh Mela. A detailed schedule will be available closer to the event.", "embeddingSimilarity": 0.02435364017738495 } ] }
curl --location 'https://dataset-service.bhasai-dev.k8s.bhasai.samagra.io/v2/search' \ --header 'orgId: f2070b8a-0491-45cb-9f35-8599d6dd77ef' \ --header 'botId: 0c09de45-df1a-4dee-8486-3904a5f51611' \ --header 'Content-Type: application/json' \ --data '{ "type": "vector", "query": "lunch", "parameters": { "algorithm": "text-embedding-ada-002", "threshold": 0.01, "topK": 10, "searchColumn": "question", "datasetId": "666f0c22-688e-41c4-8e95-5bd481a8c3d7" } }'
10 QnAs
{ "data": [ { "id": "0ca98232-4c9e-4cdf-b576-1b1b9368572c", "question": "Where will I get lunch and dinner?", "answer": "Within the Kumbh Mela area, you will find a variety of food stalls and eateries offering a wide range of dining options, including traditional Indian cuisine, vegetarian meals, and snacks. These food outlets are spread across the mela grounds to ensure easy access for all attendees. Many of them follow strict hygiene standards to ensure food safety. ", "embeddingSimilarity": 0.43154116317755287 }, { "id": "7a3a50a5-974a-42d9-983f-47cb70423c15", "question": "Me and my family would like to stay for an extended duration- can we cook our own meals?", "answer": "Yes, some accommodations at the Kumbh Mela provide facilities for cooking your own meals, especially in family tents and extended stay areas. These options often include basic kitchen setups with necessary utensils and cooking equipment. Additionally, guidelines for safe cooking practices are provided to ensure safety and hygiene.", "embeddingSimilarity": 0.3161537418788881 }, { "id": "cf2602d6-b79c-4648-924a-5a490cebdcec", "question": "What are the commute facilities to reach there?", "answer": "Various commute facilities, including auto-rickshaws, buses, and cabs, are available to reach different locations within the Kumbh Mela area. Kumbh Sah'AI'yak can guide you on the best options.", "embeddingSimilarity": 0.3067903707244716 }, { "id": "196c31e9-63e3-42f4-a002-7caefe567e1f", "question": "What are the provisions for the senior citizens at the mela area?", "answer": "Provisions for senior citizens include priority access, dedicated seating, and assistance services. Kumbh Sah'AI'yak can provide detailed information based on specific needs.", "embeddingSimilarity": 0.3035538945466547 }, { "id": "cdc5e78b-6375-465f-8075-a0a8c1d96203", "question": "What are the timings of major events/festivities?", "answer": "The timings of major events and festivities will be provided in the schedule closer to the event. Kumbh Sah'AI'yak can help you stay updated on the event timings.", "embeddingSimilarity": 0.29969752239298975 }, { "id": "7ca88de1-4345-45bb-83a5-7dcca73671b8", "question": "What are the important dates and events there?", "answer": "The important dates and events include various religious ceremonies, cultural performances, and key bathing dates (snan) during the Kumbh Mela. A detailed schedule will be available closer to the event.", "embeddingSimilarity": 0.2917639948823505 }, { "id": "06db6215-72f7-4317-a67e-7cc9881de35a", "question": "Are there any special provisions for senior citizens?", "answer": "Yes, there are special provisions for senior citizens, including priority access to certain areas, dedicated seating, and assistance services to ensure their comfort and safety during the event.", "embeddingSimilarity": 0.2856387740267454 }, { "id": "1088b68c-33ec-45e4-ba63-91dfcad3bad5", "question": "Should I book flight or train journeys are more convenient?", "answer": "The choice between flight and train depends on your budget, convenience, and availability. Kumbh Sah'AI'yak can help you compare options and make the best choice for your journey.", "embeddingSimilarity": 0.28216980620067644 }, { "id": "a2000e50-6bda-44ac-8a8e-62d0c540d66c", "question": "What child-suitable facilities/activities exist?", "answer": "You can find designated play areas, child-friendly activities, and special amenities to ensure a safe and enjoyable experience for children at the Kumbh Mela.", "embeddingSimilarity": 0.28201500513566313 }, { "id": "0b06386b-34b8-4f16-89d8-72a2d8c42c7c", "question": "If I am traveling by car and staying in a tent will there be parking facility near my stay?", "answer": "Yes, there will be parking facilities near tent areas. Kumbh Sah'AI'yak can provide details on parking locations and availability.", "embeddingSimilarity": 0.2816695318183302 } ] }
@KDwevedi with provided curl, I checked text-embedding-ada-002 and bge. Expectation is to give same response, But here in above data with Bge algo got 7 QnAs and where as with text-embedding-ada-002 received 10 QnAs . Please check why this happening.
@KDwevedi @sooraj1002 to take this up as a bug report 7 and 10 items returns counts should not be disparate for extremely low threshold
Current state:
OpenAI embedding take too long to respond (2-3 seconds sometimes)
Expected Behaviour
All docs stored as BGE-small vectors and self hosted BGE-small used to retrieve from document service (and any required document service)
Implementation details:
Migration of older data
Adding embedding
Testing