Azure-Samples / chat-with-your-data-solution-accelerator

A Solution Accelerator for the RAG pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices.
https://azure.microsoft.com/products/search
MIT License
640 stars 313 forks source link

IV changes for Explore Data & Delete Data page #741

Closed komalg1 closed 1 month ago

komalg1 commented 1 month ago

Purpose

Does this introduce a breaking change?

[ ] Yes
[x] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install
github-actions[bot] commented 1 month ago

Coverage

Coverage Report
FileStmtsMissCoverMissing
code
   app.py14140%1–4, 6–7, 9, 12–14, 16, 18, 20–21
   create_app.py148397%199, 204, 327
code/backend
   Admin.py23230%1–6, 9, 11, 13–14, 16, 19–20, 22–23, 26, 33, 40, 43–45, 47, 49
code/backend/batch
   AddURLEmbeddings.py29293%37–38
   BatchPushResults.py31196%53
   BatchStartProcessing.py190100% 
   GetConversationResponse.py32390%63–65
   function_app.py16160%1–8, 10, 12–13, 15, 18–21
code/backend/batch/utilities/common
   Answer.py24195%39
   SourceDocument.py58493%31, 35, 39, 124
code/backend/batch/utilities/document_chunking
   DocumentChunkingBase.py10280%10, 16
   FixedSizeOverlap.py190100% 
   Layout.py190100% 
   Page.py170100% 
   Paragraph.py990%1–4, 7–9, 12, 15
   Strategies.py29582%24–25, 27, 29, 46
   __init__.py70100% 
code/backend/batch/utilities/document_loading
   DocumentLoadingBase.py9188%13
   Layout.py12120%1–4, 7–9, 11–13, 16, 25
   Read.py12120%1–4, 7–9, 11–13, 16, 25
   Strategies.py20860%13, 15, 17, 19, 24–25, 27, 29
   Web.py19194%23
   WordDocument.py25250%1–6, 9–12, 21–24, 26–27, 29–30, 32–37, 45
   __init__.py15193%16
code/backend/batch/utilities/helpers
   AzureBlobStorageHelper.py723354%20–22, 30, 50, 53–54, 59, 63, 88–89, 91, 95, 106, 110, 116, 131, 134, 153, 156, 158, 166–170, 193, 197–201, 203
   AzureFormRecognizerHelper.py81810%1–6, 9–11, 13, 16–17, 25, 27, 35, 43–45, 52–55, 60–68, 70, 73–75, 77–78, 81, 84–86, 88–90, 93, 97–98, 105–109, 111–114, 117–131, 133, 135–137, 139–140, 143, 145–147
   AzureSearchHelper.py200100% 
   ConfigHelper.py1120100% 
   DocumentChunkingHelper.py12191%21
   DocumentLoadingHelper.py12191%14
   DocumentProcessorHelper.py601673%40, 52–59, 63–66, 86–88
   EnvHelper.py1271092%208, 213–214, 217–219, 228, 232–234
   LLMHelper.py332039%11–13, 15–16, 22, 28–29, 34, 37–38, 47, 58–59, 71, 83–84, 91, 101, 109
   OrchestratorHelper.py12466%20–22, 25
code/backend/batch/utilities/integrated_vectorization
   AzureSearchDatasource.py190100% 
   AzureSearchIndex.py350100% 
   AzureSearchIndexer.py20290%47–48
   AzureSearchSkillset.py200100% 
code/backend/batch/utilities/loggers
   ConversationLogger.py362822%8, 11–12, 15–24, 27–30, 33–42, 46
   TokenLogger.py9455%7–8, 11, 15
code/backend/batch/utilities/orchestrator
   LangChainAgent.py722959%23–28, 30, 65–66, 71–73, 78, 98–101, 118–119, 122–125, 132–133, 138–140, 143
   OpenAIFunctions.py66660%1–3, 5–12, 14, 17–21, 56, 59, 62–64, 69–71, 76, 79, 81, 87–90, 92, 95, 102–106, 110–111, 113, 119–123, 127–129, 132, 135–136, 139, 144–146, 149–151, 156–158, 161, 164, 169
   OrchestratorBase.py321553%14–20, 31, 40–42, 49–51, 61
   Strategies.py12741%10–11, 13–15, 17, 19
   __init__.py110100% 
code/backend/batch/utilities/parser
   OutputParserTool.py390100% 
   ParserBase.py9277%9, 19
   __init__.py7271%7, 11
code/backend/batch/utilities/search
   AzureSearchHandler.py31196%12
   IntegratedVectorizationSearchHandler.py320100% 
   SearchHandlerBase.py23673%11, 15, 19, 23, 27, 31
code/backend/batch/utilities/tools
   AnswerProcessingBase.py8275%8, 12
   AnsweringToolBase.py9277%9, 15
   ContentSafetyChecker.py412539%16, 18–19, 24, 30–32, 35–36, 42–43, 49–54, 57–59, 61, 65–67, 69
   PostPromptTool.py221340%11, 14–15, 17–18, 22, 29, 36–37, 45, 51–52, 60
   QuestionAnswerTool.py650100% 
   TextProcessingTool.py16943%9, 12–15, 19, 21, 28, 35
code/backend/pages
   01_Ingest_Data.py1201200%1–12, 18–22, 24–26, 28, 34, 41, 44, 48–49, 51, 56, 59–60, 63–72, 76–78, 81–84, 86, 89–99, 102–109, 112–114, 116, 119, 121–124, 129–134, 137, 140–141, 144, 150, 163–166, 169–170, 174, 178, 185, 199–202, 205, 210–211, 213–214, 216–218, 222, 225–226, 232–235, 242–243, 248, 250–251
   02_Explore_Data.py29290%1–7, 10, 12–13, 15, 21, 28, 31, 39, 41–43, 45, 47–50, 52–55, 58–59
   03_Delete_Data.py41410%1–7, 10, 12–14, 16, 22, 29, 32, 40, 42–44, 47, 49–50, 52–55, 57, 59–60, 64, 68–72, 74, 77–78, 80–82
   04_Configuration.py1251250%1–9, 11, 13, 15, 22, 29, 31, 36–45, 48–49, 52–63, 65–66, 76–80, 83–84, 88–90, 93–94, 97–98, 101–102, 125, 127–128, 130–134, 136–139, 142–146, 153–154, 164–166, 168, 188–189, 191, 193, 199, 207, 215, 222–223, 230, 232–233, 237, 245, 251, 258, 276–277, 294–295, 299, 301–302, 319, 348–349, 351–352, 355–356, 359–362, 364–365, 367–369, 371–374, 376–377
TOTAL210683760% 

Tests Skipped Failures Errors Time
130 0 :zzz: 0 :x: 0 :fire: 8.169s :stopwatch:
superhindupur commented 1 month ago

The search client logic extraction is looking great - one thing I missed when we discussed this earlier, sorry - is it possible to create the clients from a central place so that the Explore and Delete pages don't have to worry about if env_helper.AZURE_SEARCH_USE_INTEGRATED_VECTORIZATION? But not a big deal if this can't be done, just a thought.

Second - is it possible to add unit tests for the extracted search methods in the two clients, now that they're not connected with StreamLit directly anymore?

komalg1 commented 1 month ago

The search client logic extraction is looking great - one thing I missed when we discussed this earlier, sorry - is it possible to create the clients from a central place so that the Explore and Delete pages don't have to worry about if env_helper.AZURE_SEARCH_USE_INTEGRATED_VECTORIZATION? But not a big deal if this can't be done, just a thought.

Second - is it possible to add unit tests for the extracted search methods in the two clients, now that they're not connected with StreamLit directly anymore?

Thanks for the review. Will take it up as a refactoring task & see if it is possible to have clients at a central place. Added unit tests

superhindupur commented 1 month ago

Apart from some missed test cases, looks good to me. Once the tests are added I'm happy to approve. Thanks Komal!