-
In my opinion, the generation should be the same when draft model and target model is the same and temparature is 0.
But in this case, the output logits of draft model and target model have a bit d…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a…
-
### What is the issue?
Background:
Kubernetes 1.31 introduced a new feature: [Read-Only Volumes Based on OCI Artifacts](https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/).…
-
Hello,
First I'll say, really impressed by this library and looking forward to TTS!
I ran the example project on my android pixel 7 (Same one you used) and I am not seeing the same performance t…
-
### System Info
```shell
optimum-neuron 0.0.20
neuronx-cc 2.*
python 3.10
```
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [X] My own modified scripts
…
-
-
https://huggingface.co/apple/OpenELM
Has models ranging from 270M to 3B parameters. Would love to see more support for small models, since I'm stuck with 4gb VRAM currently. Tinyllama can't fill ev…
-
**Description**
When a user performs a long-running inference request via HTTPServer, they may lose connection or intentionally abort the connection (ctrl-c from curl).
Ideally, the HTTP server will…
-
With limited memory on most of phones, there's community requests on supporting a model with a smaller size like Phi-3 mini. It may be supported out of box, but need to verification, evaluation and pr…
-
I would like to request 1 or 2 examples of how to adapt this for a popular open models, such as:
https://huggingface.co/mistralai/Mistral-7B-v0.1
https://huggingface.co/meta-llama/Llama-2-7b-hf
h…