Closed ichernev closed 3 months ago
The main difference between the COG-based custom models and other models, is that cog expects all the input parameters under the input
key, whereas regular text-to-image expects the arguments directly in the body (like embeddings and text-generation).
Thanks for the clarifications, @ichernev. Your input has been invaluable in enhancing the functionality and usability of the package.
The improvements you've suggested, including updating the documentation, adding a base model named Custom based on the SDXL type, and conducting research on how to implement this in TypeScript, will all be incorporated in the upcoming release by this weekend.
It's not crystal clear from the docs, but there are two interfaces for TextToImage (ImageGeneration) models:
Currently SDXL is very popular, so it makes sense to keep it's custom input specification intact. However, it should have a base model with
Custom
in the name with type-argument the concrete SDXL type (so other similar custom models can be added in the future). I'm not 100% sure how to do this in TS, but I can do some research.And for the regular
TextToImage
(namedImageGeneration
here) models you can have the ImageGenerationBaseModel, which has a common API across many models.