ai-forever / Kandinsky-3

https://ai-forever.github.io/Kandinsky-3/
Apache License 2.0
313 stars 29 forks source link

Encoder-Decoder Question #16

Closed Bigfield77 closed 8 months ago

Bigfield77 commented 8 months ago

Hello,

Cool work on Kandinsky 3!

From what I seem to understand, you are only using the encoder part of flan-ul2?

Do you think it's possible to prune the decoder out and create a smaller encoder model in this case since the decoder part must be bigger (is it..?). I guess the decoder part of flan-ul2 is never used during inference (or training..?)

Would that make the model smaller?

cheers, François

anvilarth commented 8 months ago

Hello!

Nice to hear your words about Kandinsky 3! Yeah, we are only using encoder part of flan-ul2 (both on inference and training) due-to decoder models are worse than encoder models in text processing (like if you change the order of words for decoder models it will change the result completely, but encoder models are more stable). Yeah, it is possible to create smaller encoder model, but training it from scratch it's very hard problem, so we only use ready-made models. Btw, idea is great because we want to make model more available, but the model size is too big for now

Best, Andrei

Bigfield77 commented 8 months ago

Thanks for your answer!

I checked the diffuser code and they seem to filter out loading the decoder when loading T5EncoderModel: https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py class T5EncoderModel(T5PreTrainedModel): _tied_weights_keys = ["encoder.embed_tokens.weight"] _keys_to_ignore_on_load_unexpected = [r"decoder"]

So I guess using a pruned model wouldn't affect memory performances at all for Diffusers.

Not sure if you are doing similar for the native path but it might help.

I saw a model on huggingface that seems to be an encoder only..: https://huggingface.co/pszemraj/flan-ul2-text-encoder/tree/main

I will give this a try to see if it works the same :)

Bigfield77 commented 8 months ago

btw, I tried using the diffusers pipeline with both the google/flan-ul2 and pszemraj/flan-ul2-text-encoder and got exactly the same results!