-
OpenAI 3.5 and 4o are included.
However, many open-source alternative LLMs exist too.
**TODO: decide which open-source alternatives to use.**
-
### System Info
- CPU Architecture x86_64
- GPU: NVIDIA T4 * 4 (AWS g4dn.12xLarge)
- TensorRT-LLM v0.10.0
### Who can help?
[QiJune](https://github.com/QiJune)
@byshiue
### Information
…
-
### Description
When using LLM openai interface directly, everything else is available except for chat of GRAPHRAG Collection , which is not available.
there is the error:
/home/master/.local/lib/p…
-
generator.get_next_token always returns zero
generator.get_output(“logits”).squeeze also returns an array of zeros
this results in a blank Output everytime
————————-
model used : phi3-mini-1…
-
### Describe the issue
I am running phi3-mini-int4 using the usual onnxruntime c# API and it is 2x as slow as when I use the [genai code](https://github.com/microsoft/onnxruntime-genai). I am using…
-
These are some notes for the [JOSS paper review](https://github.com/openjournals/joss-reviews/issues/6971). In general, I think it's a nice paper and potentially useful package.
Comments regarding …
-
I'm not sure if this is stated in the docs, but currently running one agent with a specific model that has a tool which calls another agent(with a different model) will fail to do so and ends up in an…
-
The recent addition of optimizer CPU offload in torchao can be useful for single GPU low memory config.
https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload…
-
### What is the issue?
When I start the Conversation in German Language, Phi-3 Mini and Medium working fine. But after some Conversations, the Models starting producing slowly Gibberish and Nonsens a…
-
https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
https://huggingface.co/microsoft/Phi-3-medium-4k-instruct
https://huggingface.co/microsoft/Phi-3-small-8k-instruct
https://huggingf…