Unstructured-IO / unstructured-api

Apache License 2.0
505 stars 108 forks source link

local docker setup of unstructured api asks for API key with 401 errors #386

Closed ahmedrehman closed 6 months ago

ahmedrehman commented 6 months ago

Describe the bug im trying the unstructured api with a local docker like in documnetation : https://js.langchain.com/docs/integrations/document_loaders/file_loaders/unstructured

To Reproduce docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0 try to use it it will always answer 401 becaue of malformed API Key, and there is no documentation how to get or create that key locally, As i understand it should not ask for an api key f i dont specify the environment variable

Environment: docker on windows

omikader commented 6 months ago

Are you setting apiUrl to point to your local instance?

ahmedrehman commented 6 months ago

I tried now, it still asks for apikey

docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0

import { UnstructuredDirectoryLoader } from "langchain/document_loaders/fs/unstructured";

const options = { apiKey: "", apiUrl: "http://localhost:8000" };

const loader = new UnstructuredDirectoryLoader( "woertli" , options ); const docs = await loader.load();

omikader commented 6 months ago

Can you share the exact error you're seeing? Also, you can omit apiKey from options since you're not providing one

ahmedrehman commented 6 months ago

I tried also without apikey

langchainnodeexample@1.0.0 start node test.js

file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:185 throw new Error(Failed to partition file ${this.filePath} with error ${response.status} and message ${await response.text()}); ^

Error: Failed to partition file C:\ahmed\wrk\tmpWork\2024\proj\ai\langchainNode\woertli\wortli1pdf.pdf with error 401 and message {"detail":"API key is malformed, please type the API key correctly in the header."} at UnstructuredLoader._partition (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:185:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async UnstructuredLoader.load (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:194:26) at async UnstructuredDirectoryLoader.load (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/directory.js:94:40) at async file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/test.js:24:14

Node.js v20.10.0

omikader commented 6 months ago

I suspect that you are still not hitting your Docker container for some reason because that error response message does not exist in the repo. Make sure that you are seeing the request come in to your container using docker logs if possible.

ahmedrehman commented 6 months ago

Yes you are right
const options = { apiKey: "", apiUrl: "http://localhost:8000" }; this works, api problem is gone

awalker4 commented 6 months ago

Thanks @omikader for the assist here!

ahmedrehman commented 6 months ago

I now one step further
i get the response
{"detail":"Not Found"} on localhost:8000 dont know the api url, but it somehow seems to do something:

if i give a wrong filepath i get different errors about file not there.
but if i provide a correct file path, then i get detail not found

Error: Failed to partition file /ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/exampledata/trainingdata/examplestoUse.csv with error 404 and message {"detail":"Not Found"} at UnstructuredLoader._partition (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:185:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async UnstructuredLoader.load (file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/node_modules/langchain/dist/document_loaders/fs/unstructured.js:194:26) at async file:///C:/ahmed/wrk/tmpWork/2024/proj/ai/langchainNode/test.js:27:15

omikader commented 6 months ago

You're getting a 404 because you're not hitting the right endpoint. The apiUrl needs the path /general/v0/general too.

const options = {
  apiUrl: "http://localhost:8000/general/v0/general"
};
ahmedrehman commented 6 months ago

Yes great help now it works, please update the documentations they are bit spare

ahmedrehman commented 6 months ago

i found this installation instructions, which works nicely, i had really spent lots of time trying to get unstructured running for langchain ai examples https://www.youtube.com/watch?app=desktop&v=svzd5d1LXGk from echohive