deepglugs / dalle

8 stars 4 forks source link

question about vocab #3

Closed skywo1f closed 3 years ago

skywo1f commented 3 years ago

what should I put in my curated_512.vocab file?

deepglugs commented 3 years ago

Depends on your dataset. Using the most common tags/labels/words from your dataset is a good practice. For danbooru-style tags, you can run get_vocab("/path/to/tag/files", top=512) to get the top 512 most commonly used tags. Then you'll want to save the returned array to a new vocab file.

You should be able to do the same for image description files (ie "a photo of a fat orange cat") as well so long as you change the splitter argument in get_vocab to something like splitter=" ".