sg.sample_sequence returns context after pre-trained model

akanyaani / gpt-2-tensorflow2.0

OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0

https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

MIT License

261 stars 83 forks source link

sg.sample_sequence returns context after pre-trained model #12

Open bytes-commerce opened 4 years ago

bytes-commerce commented 4 years ago

First ofd all, thanks for providing this amazing repository providing a possibility for tf2! Secondly, I were using the Readme to pre-train my model and eventually using sequence_generator.py to pass some context to the model.

However, the response is always 1:1 the same as the context but the capital letters are being replaced with ??s. The question now is, what am I doing wrong? Have I maybe forgotten a thing? Is there maybe a edge case leading to this point that could be prevented?

Please let me know any additional information you might need! Thanks a lot!

jzl0166 commented 4 years ago

same problem

jspangl3r commented 4 years ago

also getting weird output like this.

vedranbajic commented 3 years ago

First of all, thank you for sharing your code! Helped me a lot starting with gpt2. I really do not know if this is relevant but I just debugged sample.py.

output will only append zeros: tf.Tensor([[ 3 13727 5825 0 0 0 0 0 ...]], shape=(1, 515), dtype=int32)

If my sequence length is 512 — I will get 512 zeros (+3 above zero numbers because of my context). My output is just the words I have provided as context because the rest is 0.

edit 1: logits is always nan in my case resulting in 0.

edit 2: self.embedding_weights is nan. Maybe somethings wrong with the initializer?