So I'm trying to finetune an SD model using some images. I want to add the 2 keywords (tokens?) pmauritania and cafrica, letting the model associate all images with both, and using mauritania and africa as vector points to start from when learning these words. I want to be able to prompt my model with these and have it produce results associated with these initializer words, as well as the images I'm giving it. I added both of these keywords at the beginning of the image captions, which I wrote to .txt files in the same folder, with the same name as the corresponding jpgs:
PTI : Placeholder Tokens ['<pmauritania>']
PTI : Initializer Tokens ['<mauritania>']
...
raise ValueError("The initializer token must be a single token.") ValueError: The initializer token must be a single token.
So I try using only 1 token
--placeholder_tokens="pmauritania" ^
--initializer_tokens="mauritania" ^
same thing.
--placeholder_tokens="" ^
--initializer_tokens="mauritania" ^
and
--placeholder_tokens="" ^
--initializer_tokens="" ^
same thing. Am I understanding this usage correctly? Documentation is lacking on how to caption the images, how exactly to use these args, etc.
So I'm trying to finetune an SD model using some images. I want to add the 2 keywords (tokens?) pmauritania and cafrica, letting the model associate all images with both, and using
mauritania
andafrica
as vector points to start from when learning these words. I want to be able to prompt my model with these and have it produce results associated with these initializer words, as well as the images I'm giving it. I added both of these keywords at the beginning of the image captions, which I wrote to .txt files in the same folder, with the same name as the corresponding jpgs:I get
So I try using only 1 token --placeholder_tokens="pmauritania" ^ --initializer_tokens="mauritania" ^
same thing.
--placeholder_tokens="" ^
--initializer_tokens="mauritania" ^
and
--placeholder_tokens="" ^
--initializer_tokens="" ^
same thing. Am I understanding this usage correctly? Documentation is lacking on how to caption the images, how exactly to use these args, etc.