Support regular TextToImage

ichernev commented 4 months ago

It's not crystal clear from the docs, but there are two interfaces for TextToImage (ImageGeneration) models:

one is the SDXL one, which is actually custom (i.e it might support params not available in any other model) -- shipped like this: https://deepinfra.com/docs/advanced/custom_models
the other is shared among a few models, like:
- https://stage.deepinfra.com/runwayml/stable-diffusion-v1-5
- https://stage.deepinfra.com/prompthero/openjourney

Currently SDXL is very popular, so it makes sense to keep it's custom input specification intact. However, it should have a base model with Custom in the name with type-argument the concrete SDXL type (so other similar custom models can be added in the future). I'm not 100% sure how to do this in TS, but I can do some research.

And for the regular TextToImage (named ImageGeneration here) models you can have the ImageGenerationBaseModel, which has a common API across many models.

ichernev commented 4 months ago

The main difference between the COG-based custom models and other models, is that cog expects all the input parameters under the input key, whereas regular text-to-image expects the arguments directly in the body (like embeddings and text-generation).

ovuruska commented 3 months ago

Thanks for the clarifications, @ichernev. Your input has been invaluable in enhancing the functionality and usability of the package.

The improvements you've suggested, including updating the documentation, adding a base model named Custom based on the SDXL type, and conducting research on how to implement this in TypeScript, will all be incorporated in the upcoming release by this weekend.

deepinfra / deepinfra-node

Support regular TextToImage #2