UCSB-NLP-Chang / DiffSTE

MIT License
88 stars 7 forks source link

Arabic fonts #8

Open Serjio42 opened 1 year ago

Serjio42 commented 1 year ago

Hi. Is it possible to maintain Arabic (left-to-right) text for for style transfer? In particular, I have several hundreds of similar mages with Arabic texts, is it possible to generate images with the same style with your repository? Thanks.

Question406 commented 1 year ago

Hi, our model cannot generate Arabic letters. Nonetheless, you could try to put those Arabic texts as the surrounding texts to see if the model can learn Arabic text styles. Maybe you could try to train one on Arabic text images using our code?

Serjio42 commented 1 year ago

As I understand, you use https://github.com/clovaai/synthtiger as a part of your pipeline. Unfortunately it can't synthesize solid-written Arabic text, as you can see on their demo image: https://user-images.githubusercontent.com/12423224/167302532-dbd5fa60-bcba-4f77-92ee-58bb6efda51c.png PIL can write Arabic text on an image well, but this is not what you use. Consequently, images generated with your model will have the same problems (letters not connected with each other), am I right?

Question406 commented 1 year ago

I see. Sorry, I'm not very familiar with Arabic text rendering. But since synthtiger cannot render correct arabic text as you said, our pipeline may not work. Nevertheless, I think you could prepare your arabic text data with PIL and train a diffusion model using our framework to see if it works :)

Serjio42 commented 1 year ago

@Question406 Thanks for response! I've just read your paper and thinking about what should be done to run training pipeline with Arabic/Urdu data. Suppose Arabic dataset is ready (each sample is: GT image, masked image, text instruction included non-Latin text), what should be done in addition to launch the training? The questions that came to my head are:

  1. As I see in the framework schema (Fig. 3 in the paper) you use text embeddings from CLIP tokenizer and Character tokenizer. Is it possible to use these two with non-Latin symbols? As a substitution of the default CLIP tokenizer I found this repository (not sure whether it'll be applicable). What do you think, is it possible to substitute Latin to Multilingual tokenization? And what about Character tokenizer? I have not found much info about it in the paper. Could it be Arabic/Urdu-friendly?
  2. I have two Nvidia 3090 (24 Gb memory each) - is it enough for training?
Question406 commented 1 year ago

@Serjio42, Hi!

I think the original CLIP tokenizer doesn't recognize non-Latin text. This multilingual version of CLIP is definitely useful. However, aligning the embedding space of this multilingual CLIP with the one stable-diffusion uses might require some training. I believe this paper could be helpful for this stage of training.

Regarding the character tokenizer, I'm not familiar with Arabic/Urdu, so I can't provide any insights on that.

You could try starting with a small-scale training to test its effectiveness, but I'm uncertain if it would require more computational resources to achieve satisfactory results.