Open EuphoriaYan opened 3 years ago
Well, I found that during training, - logcmk(kappa)
is always ~ -420 and never change. torch.log(1 + kappa) * (self.lambda_vmf - (output_emb_unitnorm * target_emb_unitnorm).sum(dim=-1))
is decreasing from ~ 0.5. Is it abnormal?
I tried using -approximate_vmf
in args, found that logcmkappox(kappa, emb_size)
is always ~ -690 and never change.
Hi EuphoriaYan,
Apologies for such a long delay in my reply.
As you can see, the acc is decreasing and the perplexity is always zero.
Sorry, the statistics are not named correctly. They are named according to softmax-based models. "acc" here means "cosine distance", and x-ent means vMF loss. Perplexity is computed on top of the reported vMF loss which is 0 because vMF values are highly negative (so it's sort of meaningless). The only two losses worth monitoring here are "acc" and "x-ent" which by the trend looks find since they both should be decreasing. Also if you could let me know your final validation loss on this training set, I can judge if the model trained well or not. With good token embeddings, a cosine (acc) value of less than around 0.25 usually results in decent MT performance (for English).
./fasttext skipgram -input valid.en.bpetok -output emb/en -dim 300 -thread 8
You should train the embeddings on a larger training set, not the validation set. This method needs good quality embeddings to work. If you switch it to train.en.bpetok
, you should be able to get better results. The English token embeddings (without BPE) that I used are provided here
/path/to/moses/scripts/tokenizer/tokenizer.perl -l zh -a -no-escape -threads 20 < train.zh > train.tok.zh
Not 100% sure if moses supports Chinese tokenization. This could be an issue.
Hope these suggestions resolve your issues :)
Sachin
Hi,
I'm trying using your "langvar" branch to translate from Chinese to English, but I got strange statistic and result.
Statistic:
As you can see, the acc is decreasing and the perplexity is always zero.
When I use trained model to translate, it will always translate every Chinese token into "the".
Below is my own training process: First, use moses scripts to do tokenization and truecase.
Second, use fastBPE to tokenize.
Third, train fasttext using hyperpara mentioned in #11 .
Fourth, use preprocess.py to binary, using
src_vocab
andtgt_vocab
is a difference.Finally, use train.py to train, using the same hyperpara from README.
I want to know if there are some mistakes in my training process, your response will be appreciated!
Thank you!