SerCeMan / fontogen

Hey, Computer, Make Me a Font
https://serce.me/posts/02-10-2023-hey-computer-make-me-a-font
MIT License
442 stars 26 forks source link

Accidentially used Encoder instead of Decoder? #10

Closed DerEchteFeuerpfeil closed 6 months ago

DerEchteFeuerpfeil commented 6 months ago

Hey there, great project!

I have just one thing that I am not sure if I misunderstood or you did. You wanted to reproduce the architecture of IconShop, but to my understanding, IconShop is a decoder-only transformer (like a language model). Excerpt from their paper: image

I first thought you had a minor mistake in your blog post, but also there you wrote encoder-only: image

An encoder-only autoregressive transformer is self-contradictory, isn't it?

Also here, you took the xFormerEncoderConfig instead of the (imo) correct xFormerDecoderConfig. https://github.com/SerCeMan/fontogen/blob/de1e50b2b42ed17bbfe442949957dd001b5ba80e/model/model.py#L201-L227

I haven't thought this through what the implications are. You correctly used causal attention, so there should not be issues on that front. There might not even be an issue at all, as decoders (without cross-attention) are kinda just encoders with causal attention masking, but I don't know the xformers library and its inner workings. Just wanted to set the mental model straight.

Thanks for releasing your code and project!

SerCeMan commented 6 months ago

Hey, @DerEchteFeuerpfeil! The article should indeed be saying decoder-only. Thank you for this, I'll fix it up!

Implementation-wise, if I understand correctly, there is no other way to implement a decoder-only transformer in xFormers, at least until this lands https://github.com/facebookresearch/xformers/issues/617.

DerEchteFeuerpfeil commented 6 months ago

Alright, makes sense! Does not speak for the library that a decoder-only architecture issue is open for >1y 😅

When re-implementing parts of this for my project I used x_transformers by lucidrains instead.

I'll close this issue for now, as it has served its purpose. Cheers!