Issues Importing a Data Sources, Questions about AnythingLLM API Access, Local AnythingLLM Setup via Docker + More

innovateworld commented 9 months ago

This might need to be split into multiple separate Issues. If so please let me know.

Issues Relating to My Goals (and possibly other people's) with AnythingLLM

There might not be a workaround for these but I have 2 primary goals with using AnythingLLM that I'm unable to use it for and I wanted to mention some additional separate concerns.

Goal 1: Use AnythingLLM to assist in generating Code using latest frameworks beyond LLM cutoff dates. Here are my problems:

A) Importing Certain Repositories from GitHub

Issue: I have received the following error for importing SvelteKit which is unlikely to be solved since it is related to GitHub.
Solution(s): This is likely not related to AnythingLLM but it impacts my goal. I couldn't try to find a dedicated branch for what I wanted to pull for the latest Sveltekit Version otherwise I could have potentially selected the branch I wanted.

B1) Non-Supported File Types

Issue: I then attempted to clone the SvelteKit repo and the drag and drop the folder onto files in AnythingLLM and received errors with the following issues relating to filetypes not being supported:
Solution(s): It is confusing that some .md files were uploaded but others said File does not exists in this diretory. Other filetypes were not supported. Since these other filetypes aren't currently supported I might be able to write a script to copy each of the files text into a .md file or another format (since not all .md files worked) that will import. It would be nice if either AnythingLLM could support these formats or at least find a way to import them automatically to not need scripts. I understand my use case might be difficult to accommodate so I'm not mad at AnythingLLM.

B2) Unable to easily select multiple docs in My Documents to delete several at once.

Issue: After B1 failed, I was stuck with many documents in My Documents that were successfully able to upload but a whole lot more that were unable to. I want to delete from my Custom Documents but not all of them. I can't Shift + Click to select multiple and now have to manually click one at a time to delete (plus with a verification popup).
Solution(s): Nothing I can do from the UI. I could possibly try to figure out how to find where they are stored within the Docker Image and try to delete them that way. Might be worth adding this in the Documentation somewhere for this scenario or adding in the functionality to the UI.

Goal 2: Use the AnythingLLM API from other development tools to run my LLM queries programmatically with my own external system prompts that would override the AnythingLLM system prompt to interact with the LLM but still be able to use the embeddings in the VectorDB that AnythingLLM generated with my custom Documents in my Workspace.

The AnythingLLM documentation doesn't give any details about how to interact with the AnythingLLM API. It only shows the following:
When I go to http://localhost:3001/api/docs/ I see an interface that looks like litellm but I'm unsure if I can override the System Prompt associated with AnythingLLM
Solution(s): This could just be from my lack of understanding but it would be nice to add it to the Documentation.

Separate Issues from Goals:

Running Locally with Docker is confusing

Issue 1: The docker-compose.yml doesn't seem to provide how to set local storage or any procedures for pulling the latest image without losing data. I attempted to use the docker-compose after pulling the repo but for some reason it failed.
Solution(s): Perhaps I made an error running the docker-compose.yml to run locally. I have docker experience for creating my own custom dev environments and running other projects with a docker-compose.yml from GitHub but just didn't have luck with this. If this isn't designed to run locally for some reason then this should be indicated somewhere.
Issue 2: After I failed the above I went to the AnythingLLM Documentation and unfortunately due to my lack of detailed Docker understanding, couldn't immediately figure out how to make the following work from Run AnythingLLM locally
Solution(s): This also might be due to my lack of understanding of Docker but it would be nice to provide a more simplified example of how to run this using either Docker Desktop or a docker-compose.yml file with some set of details for people that might not be specialized in Docker.
What I did to get this working was:
- Open Docker Desktop, search for AnythingLLM, pull the Image, and then Run it.
- Then I was provided with the Docker Desktop Configuration Options for the Container.
- I took a Screenshot of the Docker Desktop Configuration Options, then copied the text from my last screenshot above, added them both to ChatGPT 4.0, and then asked how to enter that information into the GUI. The instructions ChatGPT provided worked.
- I ran into other people online that also had problems running AnythingLLM locally. I wanted to help based on my experience and find a way yo make it easier. I then asked ChatGPT how to create a docker-compose.yml file that could be used to run AnythingLLM while preserving data when updating. Here is the docker-compose.yml that worked for me:
- I tested this first by running docker-compose down then docker-compose pull then docker-compose up -d and after verifying that I didn't lose any data, I then shared those instructions with others that were stuck.

To be clear I like this solution overall and I run it currently with as a LocalLLm using Ollama + Litellm but .... As a separate concern I misunderstood that http://host.docker.internal:xxxx was supposed to be input from my Litellm port inside the AnythingLLM > LM Preference > Local AI Base URL instead of trying to figure out how to set Docker to connect to that port in the docker-compose.yml file lol).

Oh and since I'm on a roll.... I wanted to mention this:

B1.2) JSON files seemed to also not be supported! Pretty big deal!

Issue: After failing repeatedly with the SvelteKit GitHub Repo via the other methods, I decided to use https://github.com/BuilderIO/gpt-crawler to crawl the Sveltekit and Svelte website documentation and save the data to a. json file but after I tried to upload it to AnythingLLM I got the following error:
Solution(s): Same as before. If this isn't supported soon then perhaps I need to write some regex scripts to clean the document and change it to a different filetype. Otherwise 3 different methods are currently not working for me with AnythingLLM to support my Goal 1. The Goal 1 would ideally work with Goal 2 together but since neither of them work, even though I think you have an overall nice product, it just doesn't work for me and I can't recommend it and won't spend the time to help others anymore :(

dhlsam commented 9 months ago

website Scraping doesn't work ,via Docker

Primary server listening on port 3001 -- Working URL https://baijiahao.baidu.com/s?id=1786347651675278442&wfr=spider&for=pc -- getPageContent failed! Error: Could not find Chrome (ver. 119.0.6045.105). This can occur if either

you did not perform an installation before running the script (e.g. npx puppeteer browsers install chrome) or
your cache path is incorrectly configured (which is: /root/.cache/puppeteer). For (2), check out our guide on configuring puppeteer at https://pptr.dev/guides/configuration. at ChromeLauncher.resolveExecutablePath (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/node/ProductLauncher.js:262:27) at ChromeLauncher.executablePath (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/node/ChromeLauncher.js:213:25) at ChromeLauncher.computeLaunchArguments (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/node/ChromeLauncher.js:107:37) at async ChromeLauncher.launch (file:///app/collector/node_modules/puppeteer-core/lib/esm/puppeteer/node/ProductLauncher.js:53:28) at async PuppeteerWebBaseLoader._scrape (/app/collector/node_modules/langchain/dist/document_loaders/web/puppeteer.cjs:42:25) at async PuppeteerWebBaseLoader.load (/app/collector/node_modules/langchain/dist/document_loaders/web/puppeteer.cjs:74:22) at async getPageContent (/app/collector/processLink/convert/generic.js:57:18) at async scrapeGenericUrl (/app/collector/processLink/convert/generic.js:11:19) at async processLink (/app/collector/processLink/index.js:6:10) at async /app/collector/index.js:48:33 TypeError: Cannot read properties of null (reading 'length') at scrapeGenericUrl (/app/collector/processLink/convert/generic.js:13:16) at async processLink (/app/collector/processLink/index.js:6:10) at async /app/collector/index.js:48:33

timothycarambat commented 8 months ago

Closing this as stale, almost all of the items listed are either now their own issues, or have been resolved, or are no longer relevant with recent changes 👍

Mintplex-Labs / anything-llm