gradient-activation Search Results

1000+ results
for gradient-activation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

fangwei123456/spikingjelly #320

Question about STDP learning

How to apply MSTDPLearner in MNIST trainning, for example, https://github.com/fangwei123456/spikingjelly/blob/master/spikingjelly/activation_based/examples/lif_fc_mnist.py, a FC SNN?

aheiluxi updated 1 year ago
1
ClimbsRocks/auto_ml #253

deep learning optimizations to make

make all the params searchable make sure we're applying all the right regularizations definitely do dropout add some batchnorm

ClimbsRocks updated 7 years ago
3
facebookresearch/long_seq_mae #3

[Question] GPU Memory Related

In your research, it is hard to train model with long sequence such as 768 in gpu. However, I can't find any special way to reduce gpu memory in your code. I want to know about your technique for tr…

chagmgang updated 1 year ago
1
KaihuaTang/Long-Tailed-Recognition.pytorch #26

Question about methods Cosine and Capsule in the paper

Hi @KaihuaTang , thank you for doing such an inspiring job and opening the source code. Would you mind telling some details of the methods Cosine and Capsule in the Table 2 of the paper. 1. I got …

yypurpose updated 3 years ago
3
microsoft/Megatron-DeepSpeed #170

Does DeepSpeed work with sequence parallel?

I see that layernorm grads etc need to be synced in sequence parallel but I think DeepSpeed bypasses this logic since it doesn't use `MegatronOptimizer` class. How does it work with sequence parallel…

mayank31398 updated 7 months ago
6
microsoft/DeepSpeed #137

Getting started guide is missing critical text about add_con…

We are missing information in our getting started guide about what `cmd_args` needs to have when executing deepspeed.initialize. https://github.com/microsoft/DeepSpeed#writing-deepspeed-models N…

jeffra updated 1 year ago
6
ugik/notebooks #3

How to increase n_epoch level

# reset underlying graph data tf.reset_default_graph() # Build neural network net = tflearn.input_data(shape=[None, len(train_x[0])]) net = tflearn.fully_connected(net, 8) net = tflearn.fully_con…

sarancruzer updated 6 years ago
1
SciML/NeuralPDE.jl #355

Adaptive activation function

https://arxiv.org/abs/1906.01170

KirillZubov updated 2 years ago
6
yangjianxin1/GPT2-chitchat #94

更换更大的gpt2模型进行训练如：gpt2_large

楼主或者其他哥们有没有使用更大的gpt2模型进行训练，求分享经验！！！！！我使用如下配置config.json进行训练，但是测试结果比较差：求指教 { "_name_or_path": "model/epoch41", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" …

htthYjh updated 1 year ago
1
tflearn/tflearn #246

Architecture Err when extending Titanic tutorial

I tried Quickstart titanic tutorial successfully and made some tests further. I am predicting a float target by 8 float inputs, and I modified some of the tutorial then 'ValueError: Cannot feed value…

forhonourlx updated 7 years ago
5

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for gradient-activation

1000+ results
for gradient-activation