99991 / SimpleTinyLlama

https://github.com/jzhang38/TinyLlama using only PyTorch
Apache License 2.0
12 stars 1 forks source link

An exemplary project #3

Open meadewaking opened 5 months ago

meadewaking commented 5 months ago

This project has been incredibly helpful to me. I'm curious about how it was developed. Could you enlighten me on the process you took, particularly how you started from the original project and then removed lit-gpt and lightning? Any insights you could provide would be greatly appreciated!

99991 commented 5 months ago

Thank you! It is always motivating to hear that the code is useful to other people.

I actually started from the Huggingface transformer implementation because I could not get the official TinyLlama implementation to work due to CUDA issues.

Unfortunately, the Huggingface library contains many layers of abstraction, which makes it difficult to follow. I stepped through the code line by line using Visual Studio Code's debugger and only kept lines which actually computed something instead of just calling other functions. I then compared intermediate results by inspecting them in the debugger or printing them. This was a very arduous process of many hours.

The sentencepiece tokenizer was less work but more complicated. I read a paper, implemented it and found out that this was actually the wrong method. Fortunately, the author of the sentencepiece tokenizer was very helpful to point me in the right direction. I eventually figured out that the decoding process was very similar to how Huffman trees are built, which I had done before, so that part was not too difficult. Decoding the binary format for the tokenizer and how the weird underscore symbols worked was a lot of guesswork and I am still not sure that I got it right (in fact, a kind contributor fixed a bug in it just recently), but at least the tests passed until TinyLlama changed the chat prompt format again. That reminds me, I should update the tests to the new version.

meadewaking commented 5 months ago

I have been working on attempts to decompose the LLaMA model but have found it challenging due to its high level of abstraction. I have been following the development of tinyllama for quite some time and have made several attempts, but unfortunately, they have not been successful.

While browsing GitHub, I came across your work.Your work, spirit, and willingness to help are truly inspiring and greatly admired.