Closed jessemcg closed 1 week ago
Hi thanks for your supportive comment and efforts in getting this running! I've really intended to make it as easy to run as possible: clone, install requirements and launch. Unfortunately I'm getting a bunch of comments complaining about issues with thr requirements.txt files. I believe the issue in your screenshot is one of missing dependencies too.
I honestly haven't had a chance to test on Fedora/DNF. But give me some time, I'll investigate this, update the requirements.txt files and update the container so deployment will be easier since you'll be able to build the container. The current Docker build is outdated unfortunately, and does not contain the improvements of LARS v2.
I'll drop an update here once I've looked into and resolved this 🍻
Awesome, thanks.
@jessemcg A new release with updated requirements is now available, please give it a spin and report your experience: https://github.com/abgulati/LARS/releases/tag/v2.0-beta7
Thank you!
@jessemcg I have now verified the requirements installation on Windows and Ubuntu: https://github.com/abgulati/LARS/blob/v2.0-beta8/requirements_linux.txt
YMMV on Fedora/DNF & Mac though!
Several unnecessary reqs have been removed and version-specs for a couple too. The encoding has also been updated to UTF-8 so you can use Nano for any required edits.
Few more refinements merged in as well, resulting in v2.0-beta8: a fast follow-up to today morning's emergency v2.0-beta7 update.
Do re-try and drop an update.
It worked on Fedora 39 Linux. Thank you so much for the quick update. There was a very minor obstacle in the beginning where it did not automatically create the base directory, but that is easily fixed by manually creating it and updating the json config file.
In case this is helpful to others, I found that setting up a virtual environment and installing the dependencies with uv was much faster than standard pip. I am attaching a toml file with the linux dependencies for use with uv in case anyone is interested. For some reason, it did not let me upload an actual toml file so you will need to rename it and get rid of the .txt at the end. With uv, you basically just run: uv init (new project name), place the LARS project in there, replace the .toml file, run uv sync, then CD to the web_app directory and run: uv run app.py.
There was one part where I had to run "export CXX=g++" when syncing dependencies with uv.
Thank again.
Thank you so much for the update and for your excellent contribution @jessemcg !
It's contributions like these that encourage open-source work 🍻
Very curious to hear of your experience: how is LARS running? Is everything working okay?
Sincere thanks again!
Everything is working very well. I appreciate how it automatically detects if a llama.cpp server is already running on port 8080, then just uses that if it is. This keeps my VRAM from filling up. My use case is legal transcripts, and the default RAG pipeline is creating very quick and accurate responses. Some of the highlights don't always make sense, but it is still a useful feature. I haven't had time to experiment with the more detailed settings, but I am glad they are there. Great job.
That's fantastic to hear, thanks @jessemcg !
Do give HF-Waitress a spin too, it makes running new models off the hub very easy: just copy the model_id and click Add! This way, you can run new models as soon as they're out without waiting for llama.cpp and GGUF support.
🍻
@jessemcg Thank you so much for your generous donation today!! Truly, I'm extremely grateful & humbled.
I'd love to give you access to my LARS-Enterprise private repository, which, amongst a host of UI and QoL updates (including deletion and renaming of chats in the sidebar) contains the following major feature updates:
And it’s all still 100% local: no data for any of the above leaves your machine! Multi-modal capabilites have been made possible entirely thanks to the flexibility of my HF-Waitress LLM server.
Do let me know if you're interested and I'll add you to my private repo!
Thanks so much once again!
I would love to check out the enterprise version, thank you. I am very impressed with how you are making everything cross-platform. Its seems like so much work.
You probably already know this, but lawyers will be a good fit for using LARS, particularly the ones that handle appeals like myself. When I started doing appeals 15 years ago, it was so tedious, but with all of the tools available now (like LARS), it is much more enjoyable. Of course, the vast majority of lawyers are not tech-savvy, so a lot of stuff is still too complicated for them at the moment. Thanks again for your hard work.
JESSE McGOWAN Attorney at Law | SBN. 250320 2794 Gateway Rd. Ste. 109 Carlsbad, CA 92009 760 440-5520
On Fri Oct 11, 2024, 08:12 AM GMT, Abheek Gulati @.***> wrote:
@jessemcg https://github.com/jessemcg Thank you so much for your generous donation today!! Truly, I'm extremely grateful & humbled. I'd love to give you access to my LARS-Enterprise private repository, which, amongst a host of UI and QoL updates (including deletion and renaming of chats in the sidebar) contains the following major feature updates:
- Support for the Llama3.2-Vision LLM to visually analyse documents in any format in a live chat: separate from uploading docs to the VectorDB for RAG, you can attach a document/Image in any format and have Llama3.2-Vision look at it and respond. I've also implemented full conversational flow for this model, so you can upload files for visual analysis, ask follow-up questions and even use RAG, all with a single instance of the Vision LLM!
- Image generation: FLUX is now supported!
- RAG-Citations pipeline has been significantly improved: BM25-Indexing is introduced into LARS to augment the existing embeddings + re-ranking pipeline And it’s all still 100% local: no data for any of the above leaves your machine! Multi-modal capabilites have been made possible entirely thanks to the flexibility of my HF-Waitress LLM server. Do let me know if you're interested and I'll add you to my private repo! Thanks so much once again! — Reply to this email directly, view it on GitHub https://github.com/abgulati/LARS/issues/28#issuecomment-2406851611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILVSN4MONA5FKZMEHUHF73Z26B5PAVCNFSM6AAAAABO7RLSTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBWHA2TCNRRGE. You are receiving this because you were mentioned.Message ID: @.***>
Jesse it’s fantastic to make your acquaintance and hear of your extensive background in law! Absolutely, the law field has very much been a huge area of interest for LARS, and I’ve been on the lookout for collaborators in the space. I’m working with partners in the accounting domain, and having conversations in the medical space, and have been really looking forward to collaborating in the law space so the timing for this connect couldn’t have been better!
Also thrilled to hear you’d like to try out LARS-Enterprise, I’ll setup your access. The transition should be seamless and simple and I’m happy to help resolve any hiccups along the way.
All my contact information is in my signature below, let’s further this conversation beyond this issues thread!
Best regards, Abheek Gulati (437) 556-9998 @.*** https://www.linkedin.com/in/abheek-gulatihttps://www.linkedin.com/in/abheek-gulati?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=ios_app
From: Jesse McGowan @.> Sent: Friday, October 11, 2024 9:22:32 AM To: abgulati/LARS @.> Cc: Abheek Gulati @.>; State change @.> Subject: Re: [abgulati/LARS] Problem with the "setup_for_llama_cpp_response" method (Issue #28)
I would love to check out the enterprise version, thank you. I am very impressed with how you are making everything cross-platform. Its seems like so much work.
You probably already know this, but lawyers will be a good fit for using LARS, particularly the ones that handle appeals like myself. When I started doing appeals 15 years ago, it was so tedious, but with all of the tools available now (like LARS), it is much more enjoyable. Of course, the vast majority of lawyers are not tech-savvy, so a lot of stuff is still too complicated for them at the moment. Thanks again for your hard work.
JESSE McGOWAN Attorney at Law | SBN. 250320 2794 Gateway Rd. Ste. 109 Carlsbad, CA 92009 760 440-5520
On Fri Oct 11, 2024, 08:12 AM GMT, Abheek Gulati @.***> wrote:
@jessemcg https://github.com/jessemcg Thank you so much for your generous donation today!! Truly, I'm extremely grateful & humbled. I'd love to give you access to my LARS-Enterprise private repository, which, amongst a host of UI and QoL updates (including deletion and renaming of chats in the sidebar) contains the following major feature updates:
- Support for the Llama3.2-Vision LLM to visually analyse documents in any format in a live chat: separate from uploading docs to the VectorDB for RAG, you can attach a document/Image in any format and have Llama3.2-Vision look at it and respond. I've also implemented full conversational flow for this model, so you can upload files for visual analysis, ask follow-up questions and even use RAG, all with a single instance of the Vision LLM!
- Image generation: FLUX is now supported!
- RAG-Citations pipeline has been significantly improved: BM25-Indexing is introduced into LARS to augment the existing embeddings + re-ranking pipeline And it’s all still 100% local: no data for any of the above leaves your machine! Multi-modal capabilites have been made possible entirely thanks to the flexibility of my HF-Waitress LLM server. Do let me know if you're interested and I'll add you to my private repo! Thanks so much once again! — Reply to this email directly, view it on GitHub https://github.com/abgulati/LARS/issues/28#issuecomment-2406851611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILVSN4MONA5FKZMEHUHF73Z26B5PAVCNFSM6AAAAABO7RLSTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBWHA2TCNRRGE. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHubhttps://github.com/abgulati/LARS/issues/28#issuecomment-2407742397, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEZB3X7MAXV5R5TBTRXV5PTZ273MRAVCNFSM6AAAAABO7RLSTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBXG42DEMZZG4. You are receiving this because you modified the open/close state.Message ID: @.***>
Hi @jessemcg ,
Unfortunately, GitHub does not let me manage access rights in a secure enough fashion when adding collaborators to a private-repository. If you're okay with it, I can reach out to you on your mobile number/LinkedIN, where we can exchange emails and I can share a GDrive link (or we can workout any other preferred method) to LARS-Enterprise v2.7, complete feature-set below. Do let me know, thank you!
hf_waitress.py
code to remove MllamaForConditionalGeneration
import from transformers
and optimum-quanto for quantizationDelete & Rename Chats in the sidebar
Waitress-WSGI Queueing to enable multiple, simultaneous chat sessions in different browser tabs/windows
Various UI & QoL Updates for enhanced look & feel and ease of use
Thank you so much for your hard work. I feel extremely close to getting this working. The llama model loads and the pdf processing is working great. But when I ask a question, it returns a "localhost:5000" error and says: "There was an error when setting up the streaming response in the method /setup_for_llama_cpp_response, more details can be viewed in the browser's console." The browser seems to show that it was getting "text/html" instead of json. (see below.) I believe it might be a flask issue, but I don't understand flask well enough to figure out the problem.
I am also including a screenshot from the server log, which says something about getting an unexpected keyword argument "embedding_fn"
I am on Fedora 39/Linux with Cuda GPU. All dependencies were installed in a virtual environment with python 3.11.9. I was using gemma-2-9b (and I chose the corresponding prompt template), but I also tried mistral with the same result. Any suggestions on how to fix this would be much appreciated.