mlp-architecture Search Results

1000+ results
for mlp-architecture

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/accelerate #3153

loading big models into memory

### System Info ```Shell colab t4 https://huggingface.co/docs/accelerate/concept_guides/ https://huggingface.co/docs/accelerate/concept_guides/big_model_inference If I have a single 16 G…

werruww updated 18 hours ago
21
open-mmlab/mmrotate #1004

STD模型训练HRSCD数据集报错：HRSCDataset: [Errno 2] No such file or di…

### Prerequisite - [X] I have searched [Issues](https://github.com/open-mmlab/mmrotate/issues) and [Discussions](https://github.com/open-mmlab/mmrotate/discussions) but cannot get the expected help. …

Joey-He updated 1 week ago
4
openai/baselines #1119

Where to find the default neural network architecture for ml…

wangyixu14 updated 4 years ago
1
databricks/megablocks #107

1-expert worse than dense model

I'm finding that training a 1-expert dMoE (brown) has worse training loss than an otherwise equivalent dense model (green). Is there some reason why this difference is expected or can I expect them to…

Muennighoff updated 1 month ago
1
DAMO-NLP-SG/VideoLLaMA2 #114

Inference code does not work for videos

The inference code provided does not work ``` import sys sys.path.append('./') from videollama2 import model_init, mm_infer from videollama2.utils import disable_torch_init def inference():…

marvlyngkhoi updated 1 week ago
3
Mark12Ding/FAME #4

Reproduction of paper results (pretraining on UCF101)

Hello! I´m trying to reproduce the results of your paper as a baseline for my thesis. However, I´m not able to reach the same results for pretraining on UCF101 as indicated in tables 1 & 3 (81.2% t…

mertesdorfj updated 2 years ago
3
meta-llama/llama-models #172

lm_head weight of Llama3.2_3B_instruct model

Hello, I find that theres no lm_head weight in model checkpoints（.safetensors）. How does model load weight for the Linear Layer of lm_head ?

Watebear updated 1 month ago
1
AniZpZ/AutoSmoothQuant #24

llama2-7b-chat量化完推理报错

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:42

AlexMa0 updated 4 months ago
5
xinntao/Real-ESRGAN #13

Improvment Idea.

I think it would be very useful to add more discriminators, from the tests I have done with conditional GANs, it seems that having several discriminators with different levels of reception fields incr…

QLaHPD updated 2 years ago
5
LLaVA-VL/LLaVA-NeXT #115

LLaVA-NeXT demo code froze while running

Trying to deploy and run demo on a 4 A6000 cluster but it seemed that the runtime froze without any exceptions... Could there be any possible problems? Sorry for asking a naive question and thanks for…

FrankFcc updated 1 month ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for mlp-architecture

1000+ results
for mlp-architecture