-
How to apply MSTDPLearner in MNIST trainning, for example, https://github.com/fangwei123456/spikingjelly/blob/master/spikingjelly/activation_based/examples/lif_fc_mnist.py, a FC SNN?
-
make all the params searchable
make sure we're applying all the right regularizations
definitely do dropout
add some batchnorm
-
In your research, it is hard to train model with long sequence such as 768 in gpu.
However, I can't find any special way to reduce gpu memory in your code.
I want to know about your technique for tr…
-
Hi @KaihuaTang , thank you for doing such an inspiring job and opening the source code. Would you mind telling some details of the methods Cosine and Capsule in the Table 2 of the paper.
1. I got …
-
I see that layernorm grads etc need to be synced in sequence parallel but I think DeepSpeed bypasses this logic since it doesn't use `MegatronOptimizer` class.
How does it work with sequence parallel…
-
We are missing information in our getting started guide about what `cmd_args` needs to have when executing deepspeed.initialize.
https://github.com/microsoft/DeepSpeed#writing-deepspeed-models
N…
-
# reset underlying graph data
tf.reset_default_graph()
# Build neural network
net = tflearn.input_data(shape=[None, len(train_x[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_con…
-
https://arxiv.org/abs/1906.01170
-
楼主或者其他哥们有没有使用更大的gpt2模型进行训练,求分享经验!!!!!
我使用如下配置config.json进行训练,但是测试结果比较差:求指教
{
"_name_or_path": "model/epoch41",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
…
-
I tried Quickstart titanic tutorial successfully and made some tests further.
I am predicting a float target by 8 float inputs, and I modified some of the tutorial then
'ValueError: Cannot feed value…