clab / dynet

DyNet: The Dynamic Neural Network Toolkit
Apache License 2.0
3.42k stars 704 forks source link

"Competitive Example" (2): Penn Treebank Language Modeling #1286

Open neubig opened 6 years ago

neubig commented 6 years ago

Part of: https://github.com/clab/dynet/issues/1284

We already have a language model here:

https://github.com/clab/dynet/tree/master/examples/rnnlm

But it is not competitive with existing numbers for various reasons, largely because the training paradigm is different. We should restructure it to match, for example, the PyTorch example:

https://github.com/pytorch/examples/tree/master/word_language_model

duyvuleo commented 6 years ago

Hi Graham,

What makes the huge difference between perplexities (current dynet impl

100 vs the SOTA 53-60)? Looking at the codes, I can see the difference of batching and hidden state reinitialisation but I am still confused why dynet impl. is not good like that.

I just tried to implement a transformer LM https://github.com/duyvuleo/Transformer-DyNet/blob/master/src/transformer-lm.cc (the training paradigm is quite similar) but the best perplexity I got is around 99 (still far from the SOTA). With the same paradigm of my transformer for translation, I can get the BLEU close to SOTA. This makes me confused!

Can you give some hints that we can make to improve the current impl.?

Thanks!

-- Cheers, Vu

On Sat, Mar 10, 2018 at 9:59 AM, Graham Neubig notifications@github.com wrote:

Part of: #1284 https://github.com/clab/dynet/issues/1284

We already have a language model here:

https://github.com/clab/dynet/tree/master/examples/rnnlm

But it is not competitive with existing numbers for various reasons, largely because the training paradigm is different. We should restructure it to match, for example, the PyTorch example:

https://github.com/pytorch/examples/tree/master/word_language_model

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/clab/dynet/issues/1286, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVOrqwfzeRkyAhvsxYuIO2NqzTOwcpzks5tcwlGgaJpZM4Sk_Fe .

neubig commented 6 years ago

The biggest reason is probably because existing work trains models that pass information across sentence boundaries, while the DyNet implementation trains models where each sentence is independent. There might be other things as well. Part of the goal of this issue is to try to reduce the number of differences.

pmichel31415 commented 6 years ago

IIRC I wrote code for RNNLM on PTB with truncated BPTT but I couldn't get it close to SOTA, if that's something you'd be interested in

duyvuleo commented 6 years ago

Hi Paul,

How much PPLX did you get with your RNNLM?

-- Cheers, Vu

On Tue, Mar 13, 2018 at 3:02 AM, Paul Michel notifications@github.com wrote:

IIRC I wrote code for RNNLM on PTB with truncated BPTT but I couldn't get it close to SOTA, if that's something you'd be interested in

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/clab/dynet/issues/1286#issuecomment-372362964, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVOrsPVygvj5bkWebv9mikxBwV4vm7tks5tdpwBgaJpZM4Sk_Fe .

pmichel31415 commented 6 years ago

Hi @duyvuleo I think I got 95 test ppl without a lot of tuning.

Here's the code I used, mostly adapted from the dynet examples or benchmark: http://www.cs.cmu.edu/~pmichel1/cont-lm-dynet.zip

duyvuleo commented 6 years ago

Thanks a lot, Paul!

I will look into it.

-- Cheers, Vu

On Tue, Mar 13, 2018 at 4:03 AM, Paul Michel notifications@github.com wrote:

Hi @duyvuleo https://github.com/duyvuleo I think I got 95 test ppl without a lot of tuning.

Here's the code I used, mostly adapted from the dynet examples or benchmark: http://www.cs.cmu.edu/~pmichel1/cont-lm-dynet.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clab/dynet/issues/1286#issuecomment-372385605, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVOrhmVWtIyT-6TbV7QfKzzOToV2nAzks5tdqpsgaJpZM4Sk_Fe .