mlp-architecture Search Results

1000+ results
for mlp-architecture

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2072

`-1` token id with Mixtral FP8 and tensorrt_llm 0.11.0

- CPU architecture: x86_64 - GPU: NVIDIA H100 - Libraries - TensorRT-LLM: v0.11.0 - TensorRT: 10.1.0 - Modelopt: 0.13.1 - CUDA: 12.3 - NVIDIA driver version: 535.129.03 Hello, I'm e…

v-dicicco updated 1 week ago
6
NVIDIA/TensorRT-LLM #1906

Ada `FP8xint4` performance issue

Since Ada GPUs like 4090 limit the FP8 arithmetic into `fp32` accumulation, it only achieve the same max `TFLOPs` compared to `fp16xfp16` with `fp16` accumulation. Further more, according to my test,…

jcao-ai updated 1 month ago
6
arcee-ai/mergekit #117

Try to add Qwen-moe into mixtral_moe.py

Hi, I try to add Qwen-moe into mixtral_moe.py, and I have done some modifications. But now, I meet some problems in there. ![1](https://github.com/cg123/mergekit/assets/53638291/000d5134-0fe0-4ba5-…

ZhangEnmao updated 8 months ago
4
zbh2047/L_inf-dist-net-v2 #3

Up and Down Variables

Dear Authors, Thank you for your works. May I ask why we need up and down variables in the model?

gwmdunda updated 8 months ago
6
TransformerLensOrg/TransformerLens #691

[Proposal] Add Lllama 3.1 support

### Proposal Add Llama 3.1 support. Currently trying to load it fails with: `ValueError: meta-llama/Meta-Llama-3.1-8B-Instruct not found. Valid official model names (excl aliases): ` ### Mot…

ssuukk updated 1 month ago
7
greenelab/deep-review #944

AmpliconNet: Sequence Based Multi-layer Perceptron for Ampli…

> Taxonomic assignment is the core of targeted metagenomics approaches that aims to assign sequencing reads to their corresponding taxonomy. Sequence similarity searching and machine learning (ML) are…

ali-kishk updated 5 years ago
4
epogrebnyak/mlmw #6

Reorganize beginner section

Updates from: - https://github.com/jacobhilton/deep_learning_curriculum (focus on transformers) - Raschka book 1. Math prerequisites Taking a derivative to find a point of minimum or maxim…

epogrebnyak updated 3 months ago
3
ArnauMiro/pyLowOrder #50

devel_MLP

Add a module for MLP neural network for pressure interpolation at different angles of attack

bef-18 updated 1 day ago
3
ConnorJL/WGAN-Tensorflow #1

Implementing weight clipping

In tensorflow I just do this for weights clipping: t_vars = tf.trainable_variables() critic_vars = [var for var in t_vars if 'crit' in var.name] self.clip_critic = [] for var in critic_vars: …

PatrykChrabaszcz updated 7 years ago
7
ucbrise/actnn #26

There is something wrong with loss.backward()

I just modify the model by model = actnn.QModule(model) After that, something wrong happened as follows: Traceback (most recent call last): File "train.py", line 336, in main() F…

Harr7y updated 2 years ago
2

上一页 1...21 22 23 24 25 26 27...100 下一页

1000+ results for mlp-architecture

1000+ results
for mlp-architecture