-
This might be unorthodox, but I had to ask.
I've been trying to run the sft script on colab T4, and on Kaggle double T4, P100 and It [instantly ran out of memory](https://colab.research.google.com/…
-
Hey, found this repository via your hugging face model `dfurman/mpt-7b-instruct-openorca`. This looks very useful!
I am currently mostly working with 2xA100 40GB.
Are there any plans to enhance th…
-
### The Feature
Repro this scenario: If the client code (as a ollama client in litellm) finishes too early it fails to fetch the rest of the response because the client is not there long enough.
…
-
There is a bug that crashes when Localai sends a chunk with streaming turned on
```js
Cannonball results 347 -> 148 tokens.
Cannonball results 196 -> 147 tokens.
Cannonball results 309 -> 148 to…
-
I deployed the docker image from the link:
docker pull mintplexlabs/anythingllm:master
and run it with the following cmd in my MacBook Pro intelCPU
docker run -d -p 3001:3001 mintplexlabs/anythi…
-
The evaluation is really cool. However, the open-source models on the leaderboard are no longer up-to-date.
Open-source models based on llama-2 surpass their earlier generations by a significant ma…
-
### Bug Description
Im getting error 'Llama' object has no attribute 'context_params' here:
`service_context = ServiceContext.from_defaults(llm=llm, embed_model="local")`
Here is merged pull re…
-
I followed the instructions to install AutoAWQ
Here is my code:
`from transformers import AutoTokenizer
from awq import AutoAWQForCausalLM
# Load Model and Tokenizer
def load_model_tokeniz…
-
### Describe the bug
The new superboogav2 extension crashes when generating data
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Reproduction
- Lo…
-
When using fastchat to run the longchat model on a Mac M2, I was able to successfully generate output, but python's memory usage ballooned by about 1 gig every 5 tokens outputted until I ran out of ra…