Closed adnan1306 closed 6 months ago
We convert bytes to integers as this is what we must pass the transformer model. Each integer is converted into an embedding, which is then passed to attention+MLP blocks. This is standard transformer decoder-only training. Can you be more specific if I misunderstood your question?
Yeah, makes sense! Thanks for the prompt reply!
Dear authors,
I really liked your work and was trying to understand the implementation. I have a quick question: why do we convert the image into byte string using
.tobytes()
( in https://github.com/google-deepmind/language_modeling_is_compression/blob/a93a5d1679055c7f1101fa1b2db5bbe20d6169bb/data_loaders.py#L112) and convert it back to integer before encoding (in https://github.com/google-deepmind/language_modeling_is_compression/blob/a93a5d1679055c7f1101fa1b2db5bbe20d6169bb/compressors/language_model.py#L86? )