Azure-Samples / azure-search-openai-javascript

A TypeScript sample app for the Retrieval Augmented Generation pattern running on Azure, using Azure AI Search for retrieval and Azure OpenAI and LangChain large language models (LLMs) to power ChatGPT-style and Q&A experiences.
MIT License
252 stars 130 forks source link

Indexer failing with "stream timeout" #204

Closed pocketcalculator closed 3 months ago

pocketcalculator commented 5 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

The indexer fails to process one of my files, a larger PDF (1678 KB). I've converted the file to an MD (318 KB) and it still fails with the same error. Small files work fine.

Any log messages given by the failure

When tasked to index the one file, after about five minutes, the script exits with this:

Error indexing files: Unexpected token 's', "stream timeout" is not valid JSON

Expected/desired behavior

Smaller files will result in the indexer script exiting with "indexed successfully".

OS and Version?

WSL version: 2.1.5.0 Ubuntu 20.04.6 LTS

azd version?

azd version 1.9.3

Versions

Mention any other details that might be useful

The indexer container app's console log stream shows the following:

2024-05-23T03:26:14.350807125Z {"level":30,"time":1716434774350,"pid":18,"hostname":"indexer--azd-1111111111-1111111111-b7tsg","reqId":"req-3","req":{"method":"POST","url":"/indexes","hostname":"indexer.xxxxxxxxxx-11111111.eastus2.azurecontainerapps.io","remoteAddress":"10.10.10.1","remotePort":55708},"msg":"incoming request"} 2024-05-23T03:26:14.390929073Z {"level":30,"time":1716434774390,"pid":18,"hostname":"indexer--azd-1111111111-1111111111-b7tsg","reqId":"req-3","res":{"statusCode":204},"responseTime":39.857584999874234,"msg":"request completed"} 2024-05-23T03:26:14.460244824Z {"level":30,"time":1716434774460,"pid":18,"hostname":"indexer--azd-1111111111-1111111111-b7tsg","reqId":"req-4","req":{"method":"POST","url":"/indexes/gptkbindex/files","hostname":.xxxxxxxxxx-11111111.eastus2.azurecontainerapps.io","remoteAddress":"10.10.10.1","remotePort":55708},"msg":"incoming request"} 2024-05-23T03:26:14.645781688Z {"level":30,"time":1716434774645,"pid":18,"hostname":"indexer--azd-1111111111-1111111111-b7tsg","msg":"Received indexing options: {\"uploadToBlobStorage\":true,\"useVectors\":true,\"wait\":true}"} 2024-05-23T03:26:14.652612491Z {"level":30,"time":1716434774645,"pid":18,"hostname":"indexer--azd-1111111111-1111111111-b7tsg","msg":"Indexing file \"grant.md\" synchronously"} 2024-05-23T03:35:02.64794 No logs since last 60 seconds

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

github-actions[bot] commented 3 months ago

This issue was closed because it has been stalled for 7 days with no activity.