-
### 🚀 The feature, motivation and pitch
First surfaced in https://github.com/pytorch/torchchat/pull/1057, the `replace_attention_with_custom_sdpa_attention` function, used when exporting models in …
-
### Motivation
This is **not** an important feature, but I figured I'd mention it because it was a small point of friction that I think could be improved in the future. Currently my script does this:…
-
- [ ] Llama2-7b
- [ ] Llama2-13b
- [ ] Llama2-70b
-
Hi,
I was wondering if you plan to put in a public domain the sparsified Llama2 models. In particular I am interested in the Llama2-70B with 50% unstructured sparsity.
Thanks!
egeor updated
11 months ago
-
The UMR home page should be inspired by
- https://paperswithcode.com/
- https://mlcommons.org/about-us/
Content outline (to be refined @sschafft )
- [x] github link somewhere
- [x] Title: Unified M…
-
### System Info
CPU: x86_64
GPU: RTX 4080 16G
OS: fedora39
Deployment:
- tensorrt_llm/devel:latest container based; deployed by kubernetes (with runtime crio).
- time-slice deployed by gpu-op…
-
- Lanchain V 0.3.2
- Lanchain AWS v 0.2.2
We are using a fine-tuned version of Llama 3.1-instruct, uploaded to Bedrock. Since we are using an ARN model ID (which does not contain any information a…
-
Specifically this data:
`Model-Attribution-in-Machine-Generated-Disinformation/data/filtered_llm/gpt-3.5-turbo/coaid/synthetic-gpt-3.5-turbo_coaid_paraphrase_generation_filtered.csv`
The features …
-
Hi Authors, we notice that all of the attack code are missing chat templates for models. Things like `USER: {instruction} ASSISTANT:` for vicuna or `[INST] {} {/INST}` for Llama2 which make the benchm…
-
您好!需要多少显卡呀?