-
Part of: #8349
Goal is to optimise MLP mms (FF1 & FF2) on 128, 1024, 2048 sequence lengths.
The max that we can hit is calculated using this formula, and is the same for FF1 & FF2: (M * N * K / (…
-
### Proposal
Support training, loading, and inference of MLP transcoders.
### Motivation
MLP transcoders were trained by Jacob Dunefsky and Philippe Chlenski and have been shown to be usef…
-
加载量化模型,这种情况是正常的吗
model = AutoModelForCausalLM.from_pretrained("/mnt/d/code_wsl/Qwen2-7B-Instruct-GPTQ-Int8",
torch_dtype="auto",
…
-
### System Info
2024-06-26T08:59:14.473641Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, …
-
**Description:** Part of the R3/MLP effort includes sharing out of information with the individuals that were part of the stakeholder interviews conducted in April 2024. Agreed with Anna on 06/17 that…
-
I cannot _sheeprl-eval_ my trained model, since the keys in the world model's state_dict have different names:
Stacktrace
Error executing job with overrides: ['checkpoint_path=/home/drt/Deskto…
-
I would like to finetune CodeLlama-13b in a memory efficient way.
I was able to do it with CodeLlama-7b, but failing with 13b.
I can't load the model `unsloth/codellama-13b-bnb-4bit`:
```pyth…
-
Hi,
In your first draft of paper on Arxiv, you mentioned that you are using MLP mixer to mix the channels but I don't see any code that uses MLP Mixer. Can you please clarify? If you removed the M…
-
Hi, I build a simple onnx model on Tensorrt8.6. And I get an error:
mha_fusion.cpp:344: DCHECK(fc1_ && fc2_ && softmax_) failed.
Could not find any implementation for node {ForeignNode[onnx::MatMul…
-
Hello,
I was testing the MQTT event feature of the submodel repository. MQTT events for changed "Property" elements work just fine. However, changing values of "MultiLanguageProperty" elements (or …