-
System Info
CPU architecture ( x86_64)
CPU/Host memory size (64GB)
GPU properties
GPU name ( NVIDIA RTX4090)
GPU memory size (24GB)
Libraries
TensorRT-LLM branch or tag (v0.13.0)
Versions of Tenso…
-
When using model_worker with transformers to run Gemma 2 9B model does not work correctly and the conversation template applied to Gemma 2 model continue to generate response until model_worker is kil…
-
I have a model gemma 2 9B. I quantized this with AWQ-4bit. Size of model is 5.9GB. I set the kv_cache_free_gpu_mem_fraction to 0.01 and run triton on one A100. But triton takes 10748MiB of ram. I expe…
-
initial prompt
create a python flask application with UI, this page have the option to add what i ate today. in a sqllite database as backend, and retrieve when i ask for it. also need to see my eat…
-
OpenAI 3.5 and 4o are included.
However, many open-source alternative LLMs exist too.
**TODO: decide which open-source alternatives to use.**
-
### Description of the bug:
Traceback (most recent call last):
File "/home/Google/ai-edge-torch/ai_edge_torch/generative/examples/gemma/convert_gemma2_to_tflite.py", line 68, in
app.run(mai…
-
### Description of the bug:
I downloaded the `microsoft/Phi-3.5-mini-instruct` from Hugging Face and ran the [convert_phi3_to_tflite.py](https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_…
-
would be super cute honestly !! it already has a tuned in counter and a websocket server running already. would be happy to implement this if i knew it was gonna be merged :3
-
/kind bug
**Describe the solution you'd like**
Current huggingfaceserver requirements [set in pyproject toml](https://github.com/kserve/kserve/blob/master/python/huggingfaceserver/pyproject.toml#L…
-
### What is the issue?
hi? I'm studying fine tuning.
I learned using the "unsloth/gemma-2-2b-it" model.
I created the dataset myself and it contains less than 100 cases.
I want to use only the fin…