codellama Search Results

1000+ results
for codellama

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

AutoGPTQ/AutoGPTQ #210

Llama 2 70B (with GQA) + inject_fused_attention = "Not enoug…

**Describe the bug** The new Llama 2 70B features GQA. This causes an issue with inject_fused_attention. When a user attempts to do inference on a Llama 2 70B model with inject_fused_attention=Tr…

TheBloke updated 4 months ago
6
itlackey/ipex-arc-fastchat #5

PI_ERROR_OUT_OF_HOST_MEMORY Intel ARC 750

I'm running on an intel arc 750, 32Gb RAM, there is more than enough disk space, what could be the problem? ``` sudo docker run -d \ --device /dev/dri \ -v /opt/ai/models/huggingface:/root…

vevilz updated 4 months ago
2
ChuckFork/github-daily #258

GitHub Daily Top 10 @2024-04-23

# Trending repositories for C# 1. [**Jackett / Jackett**](https://github.com/Jackett/Jackett) __API Support for your favorite torrent trackers__ 13 stars today | 11,225 s…

chuck20230613001[bot] updated 6 months ago
2
NVIDIA/TensorRT-LLM #375

Get error "newSize <= getCapacity()" when call endpoint

I use the latest `tensorrtllm_backend` and `TensorRT-LLM ` of main branch to get docker images. `https://github.com/triton-inference-server/tensorrtllm_backend/tree/main#option-3-build-via-docker` …

activezhao updated 10 months ago
10
ivanfioravanti/chatbot-ollama #23

New Feature Request: Support multiple file types for importi…

## Currently: I noticed for importing data, it accepts only `json` files at the moment, as seen in `Import.tsx`. ## My Thoughts For Scaling: For scaling this feature, we could go further and su…

herropaul updated 7 months ago
3
ml-explore/mlx-examples #155

Contribute Hugging Face models to the MLX Community

We encourage you to join the [MLX Community](https://huggingface.co/mlx-community) on Hugging Face 🤗 and upload new MLX converted models and versions of existing models.

awni updated 7 months ago
20
turboderp/exllamav2 #598

Tensor parallelism issues

A couple issues with the new tensor parallelism implementation! 1) Tensor Parallelism doesn't appear to respect a lack of flash attention, even via the -nfa flag. It also doesn't document flash att…

dirkson updated 1 month ago
5
bigcode-project/bigcode-evaluation-harness #161

A common interface for APIs and Models.

### Summary of the issue First of all, thanks for the awesome effort for making code-evaluation-package. Highly appreciate it. However, right now, what I see is that it is integrated with just Hugg…

Anindyadeep updated 9 months ago
9
OpenGVLab/LLaMA-Adapter #108

Llama-adapter-v2 compatibility with llama2

Hello, thank you for the work you are doing. Does llama-adapter-v2 support llama2 or is it only working with llama? I am able to pretrain with the weights of llama2 but the inference results do no…

gian-g3dai updated 1 year ago
9
olimorris/codecompanion.nvim #223

[Bug]: Much slower token generation speed with `ollama`

### Your `minimal.lua` config Same as minimal but ``` lua strategies = { -- Change the adapters as required chat = { adapter = "ollama" }, inline = { adapter = "ollama" }, …

andy941 updated 1 month ago
28

上一页 1...33 34 35 36 37 38 39...100 下一页

1000+ results for codellama

1000+ results
for codellama